Dr. Lex' Blog: internet

Showing posts with label internet. Show all posts

Wednesday, 27 March 2024

What is the point of these bots endlessly trying utterly random HTTP requests?

I can't be the only one seeing this kind of garbage in my server logs:

"GET /!?asdas1230ds0a=da90sue21qh HTTP/1.1"

"GET /HotelInformation/HotelInformation.aspx?asdas1230ds0a=da90sue21qh HTTP/1.1"

"GET /++?asdas1230ds0a=da90sue21qh HTTP/1.1"

"GET /.cancel?asdas1230ds0a=da90sue21qh HTTP/1.1"

"GET /.specialSubmit?asdas1230ds0a=da90sue21qh HTTP/1.1"

"GET /img.youtube.com?asdas1230ds0a=da90sue21qh HTTP/1.1"

"GET /.droppable/?asdas1230ds0a=da90sue21qh HTTP/1.1"

"POST /.droppable/ HTTP/1.1"

"GET /_isMasked/?iqi_localization_country=x27f&vst=x27f&gitlab=x27f&1111ef1ee11b=x27f&ocxlaarct7tk=x27f&gad_source=x27f&landcode=x27f&confirmPrivacyStatement=x27f&26=x27f&frm_action=x27f…"

This is just a tiny sampling of endless junk that has been going on for at least the past 2 weeks. The last example is abbreviated, it goes on like that, with exactly 100 of those random query parameters that always have the same value “x27f.” It are several bots, which according to an IP locator service come from different countries, mostly the UK and Hong Kong. However, doing a WHOIS on each of the IP addresses reveals that many of them are hosted by Contabo GmbH, a cheap VPS hosting service in Germany.

Something similar has happened years ago, and then the junk also came from Contabo-hosted addresses. The pattern was similar, but each request then looked like the last example shown above, using a ridiculous number of query parameters with different field names but all the same value “z3re”. I filed an abuse report back then, and the junk stopped for a while, but it has been sporadically returning, and now it is back in a slightly different incarnation, but it still makes no sense at all. NONE.

Often these bots will still perform requests with a ridiculous number of parameters (usually 100), but more often they look like the above: a random string with the same damn query string appended to it. I really mean the same damn string for at least 2 weeks straight, which in the above case was obviously produced by someone bashing their keyboard: “asdas1230ds0a” and “da90sue21qh”. The same bot will keep on doing requests with the same base path like “.specialSubmit” or “London” for a whole day, and then might switch to another string for the next day, if I haven't kicked its ass with an iptables DROP in the meantime. The choice of these strings generally makes no sense. Lately they have also started using random characters next to city names and domains or just random words. Most of the time, the strings don't look like anything a real web app would ever use. It is all totally random. The mind boggles.

I really don't understand what is being tried to achieve with this. It is as if they are trying to brute force the internet in the hopes of finding an exploit, but the chances of this strategy producing anything fruitful is negligibly small, especially when not even varying the query parameters. Also, they do only 1 request about every 10 minutes, maybe to try to stay under the radar of suspicious activity detectors (not mine, obviously). At such slow rate, a Monte Carlo approach is just pointless.

I truly cannot grok what could be going in in the mind of whatever crackpot implemented this piece of junk and then decided to pump Kilowatts into a server farm to unleash this nonsense across the internet. If I see this in my logs, then it probably means they do these requests non-stop on whole IP ranges or a list of domains obtained from wherever. All that electricity is wasted on total nonsense. They had better spent the effort on mining crypto. It must take a very special kind of mental deficiency to believe this strategy will yield any return on investment.

Luckily the incomprehensible act of always using the same strings in the request, makes it easy to ban these bots. The set of IP addresses they work from is also pretty stable, so firing up the firewall is a good option as well.

Wednesday, 26 February 2020

Get rid of the bottom toolbar in Google Chrome on Android

In a recent update, Chrome on Android introduced a change that caused the previously experimental ‘duet’ feature to be enabled by default. This places an additional toolbar at the bottom of the screen with buttons for new tab, search, and share. This bar is supposed to disappear at the same time when the top toolbar disappears (typically when scrolling down), but on my phone this was often not the case, especially on pages that are not long enough to allow scrolling. Worse, the bottom part of such pages could then become permanently obscured by the toolbar, pretty damn annoying.

It does make sense to place controls at the bottom because this makes them usable with one's thumbs, but then everything should be put at the bottom, not both at the top and bottom! Because the latter is currently impossible in Chrome however (as far as I know), the best you can do to avoid the possible nuisances of this bottom toolbar, is to disable it.

This used to be controlled by a single setting called “Chrome Duet” in the chrome flags. Now however, there seems to be a new one called “Duet TabStrip Integration” that also enables this feature. It is unclear how these interact.

To completely disable the bar, enter “chrome://flags” in the address bar, and then enter “Duet” in the Search flags box. Set both “Chrome Duet” and “Duet TabStrip Integration” to “Disabled”. Then restart Chrome. If the bar is still present, toggle the flags back to “Enabled” and then “Disabled,” and restart Chrome yet again. Keep repeating this dance and eventually it will work.

I'm inclined to try another browser but unfortunately Google managed to lock me into their ecosystem because I use Chrome on other devices and it is pretty handy to have everything synchronised, like bookmarks. If they keep annoying me with unexpected changes like these however, I may just become motivated enough to migrate all devices to another browser.

Thursday, 19 December 2019

The State of Thingiverse, End of 2019

In June 2016, I bought a 3D printer on a whim, I only had one real concrete idea for something useful to print. But no worries, because there proved to be a website containing a few millions of free 3D models. Granted, the vast majority of models posted on it proved to be useless junk, but some were either very cool or truly useful. That site was Thingiverse. It was, and still is, the de facto standard for 3D printable model sharing. At that time, the site worked pretty well and had a very active and mostly friendly community. Questions would most often be replied to with useful advice, and if there was a problem with the website, there would at least be an acknowledgement even if the problem was not soon fixed.

The Good

Thingiverse used to be one of the main things that kept feeding my interest in 3D printing. Not just because of the new models coming in every few minutes, but also because of the community, and because it was pretty easy to share my own models, many of which started out as improvements upon someone else's. I am now the co-author of the “Flexi Rex with stronger links,” one of the models a present-day buyer of a new 3D printer seems likely to print as one of their very first attempts. I never anticipated this, I only improved upon an existing model because it broke way too soon when my grandson was playing with it. This is what I like about this community, one can easily take an existing model and improve it, and share this improvement for everyone to enjoy.

The Bad

Today however, things have changed for the worse. The turning point was somewhere in 2017, when one of the main moderators of the website, called ‘glitchpudding’, suddenly left. After this, it became much harder to get any response from whomever was responsible for maintaining the site. At the increasingly rare times that there was some kind of announcement, each time it was from a different person I had never heard of before, as if the previous one had been fired. This would not have been that bad if the website would have maintained its same quality level, but it did not. All kinds of annoying issues started popping up, like the website becoming very slow at times or throwing 500, 501, 502, … HTTP errors at random moments.

Complaining about this seemed to be of no use, because only rarely would there be a response from what might either be a Thingiverse employee or maybe just some random joker — there was no way to verify that whoever was replying on the discussion forums, was an actual Thingiverse / Makerbot employee, it was often a different username, more often without than with the “Thingiverse” badge.

Then it got worse: apparently the nearly invisible Oompa-loompas now running the site, were trying to make certain changes for… reasons. One can only guess when something has changed, because there is barely any communication when this happens. The most obvious sign of a change is an increase in the number of issues. Suddenly all photo previews were broken. Then suddenly they worked again, but any photo that was not in a 4:3 aspect ratio, would be shown distorted. This was not the case before: photos used to be letterboxed and/or cropped in reasonable ways. What was the point of this change? It only made the user experience worse.

Now at the end of 2019, there are finally some advance warnings or notifications that some issue is being worked on. For instance in December there was a site-wide banner telling that there would be “maintenance”. One would expect that this maintenance was aimed at improving the state of the website, but instead when it had ended, we had an extra HTTP error in the 500 range and random 404 errors as well, even on the main page. A website gives a very bad impression if its main page throws a 404 error.

Overall the website in its current state gives a strong impression of lack of professionalism, sometimes downright amateurism. It seems that whoever is maintaining it, does no more effort in testing their changes than trying something once and then assuming it will always work. I wonder if they have a testing environment at all. Often it feels as if they make changes to the production servers directly. The changes give an impression of lack of skill in developing a modern cloud-based website, rather it looks like trendy hipster frameworks are just being thrown on top of a rickety organically grown base without much vision. I am not saying the (obviously few) developers working on the site are amateurs, but the end result does give that kind of impression. This is likely to be merely because management does not allow the developers to spend enough time to implement things properly, so they are forced to only quickly hack things together.

These are all ‘feelings’ and ‘impressions’ due to the total lack of communication. Only from the recent 504 error pages I could see for instance that they had either thrown ‘openresty’ into the mix, or had already been using it but had now broken something about it. There has been no announcement for this or an explanation why.

I have created my own issue tracker on GitHub just to make all the most obvious problems with the website more visible, in the hopes that this would help the maintainers to decide what to fix next, but it seems to be completely ignored.

The Ugly: my guess at what is going on here

In case you didn't already know, Thingiverse is owned by Makerbot and since 2013, Makerbot is a subsidiary of Stratasys, a company that already marketed 3D printers way before the big 3D printer boom started around the year 2010. Makerbot has changed from a small company selling affordable open source printers, to a big company selling rather expensive walled-garden machines aimed at the education market. I guess most of the original enthusiastic team wanting to change the world (like glitchpudding) have either been fired and replaced by people only interested in milking profits from whatever looks vaguely promising without actually caring about it, or they have transformed into such people themselves.

Thingiverse was part of the original vision of making 3D printing affordable for the home user, a vision that does not really fit with Stratasys, the company selling industrial machines at industrial prices. I guess that at every moment since 2013 when a decision about Thingiverse had to be made, the decision has been biased towards gradually sunsetting the website. Nobody at those companies seems to understand the value of this huge playground that encourages anyone to buy, experiment with, and get familiar with 3D printers. It doesn't matter that no beginner will immediately buy a horribly expensive Makerbot or Stratasys printer: the mere fact that there is a low threshold to gain experience with 3D printers, will increase the chance that these same tinkerers will later on generate profits for those companies. That however is probably way too much thinking-ahead for the average marketeer who has been brainwashed to always take greedy short-term decisions and lodge themselves into a cozy local optimum.

What Really Is Going On

Shortly after writing this article, someone notified me about this blog post: https://xyzdims.com/2019/11/21/misc-formnext-2019-aka-just-too-much-for-one-day/

It contains a part about Makerbot's presence on the Formnext 2019 exhibition. The author had the chance to talk to Jason Chan, who is responsible for Thingiverse. Many of my suspicions are confirmed: only two developers are assigned to the site (and I guess only part-time), and the company greatly underestimates the importance of Thingiverse. There seems to be some commitment to improve it, but again it looks as if the ones holding the bag of money do not share this commitment… Still no concrete indication of what will actually happen to the site.

“But it's free!”

Every time someone posts a complaint about the broken state of the website on the Thingiverse Group forums, there will be replies in the vein of: “but it is free, you have no right to complain.” I disagree. There is no such thing as a free lunch. Everyone who uploads content to the website, somehow invests in it. Some more than others, depending on how much effort they put in crafting the presentation and documentation of their Things. I have invested quite a lot, with about 120 published Things, each with pictures and an extended description. What I am now getting in return, is a pile of issues that make it harder to upload and edit Things, and I have no idea where the site is heading because of the lack of communication from the part of the maintainers. This lack of communication and lack of care to properly test each change, feels very disrespectful, even though only in an indirect manner. It almost makes me feel like an idiot for having put all this effort in my uploads during the years I have been on the site.

There are a few particular users on the Thingiverse groups who will react religiously against any complaint, one of whom has a pretty apt username given his writing style, which makes it seem as if he is drunk (he probably just is). Ignore them, because they are either trolls feeding on the anger, Makerbot employees paid to hold a denial campaign, idiots, or all of the above. None of them upload much of anything, therefore they don't even have any ground to stand on with their claims that the site works perfectly fine.

The content creators are the only reason of existence for Thingiverse. These creators deserve a little more respect than being ignored and being handed increasingly cumbersome tools to upload their content, without any explanation of what is being changed about the website, when, and why. There is no excuse for such poor communication in this era with so many different digital communication methods (and no, Twitter is not really a good communication method). I know it can be done better, because I was pretty content with how the site was maintained and how changes were communicated when I joined it in 2016. I have the feeling that the main breaking point for the website, was when the aforementioned glitchpudding left somewhere in 2017. It seems to have gone downhill ever since. I am not the one who demands infinite progress in everything, but I do expect that when something is good, people do the effort to keep it good.

Solutions, alternatives?

Obviously any sane person witnessing such evolutions on a certain website, would start looking for an alternative. That however proves to be a big problem with Thingiverse: there is no real alternative. I have looked at some other sites, but Thingiverse's biggest trump card is its sheer library of things. No other site comes close to it, therefore it doesn't matter if Makerbot only keeps Thingiverse at a level where it is just usable enough, people will keep coming for the content.

YouMagine looks decent at a first glance: its interface is the most similar to Thingiverse's I have found so far. It is owned by Ultimaker, alhough this is not explicitly mentioned on the site's main page. After trying it out however, it is obvious that YouMagine suffers from the same lack of maintenance, or worse. The ‘blog’ part has not been updated in ages, the featured things remain the same for a very long time, and reporting spam is impossible because the ‘report’ link only points to a dead support e-mail address. The 3D model previews and the ‘assembled view’ feature are mostly broken. Links and bold text do not work in the description text, which is incredibly annoying (I can imagine that they disabled links due to the spam, but bold text: why?!)
The site still runs, but it looks like a ghost ship. Maybe the only good thing about this is that if nobody changes anything about it, they also cannot break anything about it…

There are other sites like MyMiniFactory, against which I have grown an aversion due to the apparent shills promoting it on Thingiverse groups as the best thing since sliced bread. There is also Cults3D but just as with MMF, I find that there is too much emphasis on hiding content behind a paywall.

I have no concrete ideas for a solution. The best thing would be if a new website would be constructed by a community that does care about it, not tied to one particular manufacturer, and with a strategy to keep the community and website alive in the long term. Ideally, somehow the Thingiverse library would then be migrated to this site, but that is optional. I have made backups of all my uploaded models and their descriptions and photos, and I will happily re-upload them to a new website that is worth it. Maybe it would be better this way because to be honest, I estimate 80% of all Thingiverse models to be junk that would better be garbage collected. Starting from a clean slate might be better…

Wednesday, 17 July 2019

Google Chrome ignores ‘enter’ keypresses within 3 seconds after clicking the address bar

To reload a page in a web browser, I often click the address bar and then press the enter key, for two reasons: first, I prefer this over clicking the ‘reload’ button because it guarantees that I won't re-submit form data. It is the best way to cleanly reload a page. Second, the cursor is usually very near to this bar anyway, and my hands are much nearer to the enter key than F5, and I'm of course way too lazy to press ctrl-R or command-R.

Since several years however, Google Chrome has started to sabotage this behaviour in a very annoying way. I noticed that my first enter keypress was often totally ignored. A while later I noticed the second press was also becoming ignored. This only got worse, now usually I have to press the key 4 times before the page is reloaded. The more it annoyed me, the worse it got.

Finally someone else also brought up this issue on SuperUser.com. It was still a mystery why the first presses were ignored and why there is variation in the number of presses before Chrome finally responds. The only reasonable explanation is that there must be a deliberate delay programmed in Chrome that ignores enter keypresses within a certain time period after clicking the address bar, unless something has been typed. Experiments have confirmed this, and the delay seems to be exactly 3 seconds in the current version of Chrome. This explains why the more one becomes aware of this issue, the worse it becomes: pressing Enter at a faster rate only results in more ignored keypresses. The only solution is to do absolutely nothing within 3 seconds after clicking the address bar. Of course it then becomes quicker to use one of the other reload methods, but only in a backwards kind of way because the most efficient method has been sabotaged. And if you need to use this method to reload a page without re-sending any POST data, then waiting the full 3 seconds is the only option.

My question is: why on earth was this implemented? What is the motivation? I cannot think of any good reason. Some developer must have spent time on implementing this and I have no clue why. The 3-second ignore period does not exist when bashing random keys immediately after clicking the bar—why would it need to exist when nothing has been typed? The only effect this has is to annoy people. Maybe it is yet another side effect of some change that caters for smartphones and tablets, because obviously laptops and desktop computers are totally identical to those devices [/sarcasm].
Please Google remove this “feature.”

Saturday, 14 May 2016

Another illustration why mandatory IP changes for DSL subscribers are a royal PITA

Originally this was a different article where I wrongly accused OpenDNS of blocking perfectly harmless websites, more specifically my favourite internet radio station, Radio Paradise. Eventually it proved to be a case of utter confusion caused by factors that only became apparent after some sleuthing.

Long story short, I was suddenly locked out from certain websites due to OpenDNS filters that some other customer of the same DSL service had previously set up. This customer may have been unaware that IP addresses on this service are forcibly changed every 36 hours, and did not take the necessary steps to keep their OpenDNS account up-to-date. When I received the same IP from the dynamic address pool, I also inherited the blocks. Eventually my only way out was either removing the OpenDNS servers from my config, or waiting 36 hours, or resetting the modem to get a new IP. This again makes me wonder why ISPs still enforce this idiotic IP change on customers. It is ineffective in preventing people from running servers because that can be mitigated mostly through a dynamic DNS. I guess this is simply another case of artificially degrading a product in order to be able to sell a ‘premium’ product with a static IP, at a higher price.

Here's the entire detective story for the interested.
On a certain day, I suddenly saw the following when I tried to visit the Radio Paradise site:

OpenDNS: This domain is blocked due to content filtering.

The OpenDNS servers 208.67.222.222 and 208.67.220.220 have been in my system settings for many years, and never had I seen this kind of block. Actually I had specifically switched to those servers to circumvent some idiotic blocks that were implemented in my ISP's DNS servers.

Radio Paradise being blocked made no sense at all. At first sight, there were only two plausible explanations:

Somehow I was now being defaulted to one of the OpenDNS flavours that filter potentially offensive content.
Some law prohibits listening to USA radio streams from within Belgium, and OpenDNS enforced this.

The second explanation was very implausible because I have never heard of any law like that, it certainly would have caused a ruckus on the Web. The first explanation seemed plausible because from a quick glance at the OpenDNS home security plans, it appeared that ‘Family Shield’ is the default when merely using the OpenDNS servers without configuring anything. Then I noticed this actually required configuring different servers, but nothing changed when I did: RP was still blocked. Hence I assumed that using OpenDNS without creating an account, had become equivalent to Family Shield.

This (incorrect) assumption implied two things: first, RP would somehow have been considered ‘adult’ content and second, many visitors would suddenly be locked out. They would need to go through the effort of creating an OpenDNS account to choose what is being blocked. For DSL subscribers whose IP is forcibly changed at regular intervals, this would be particularly annoying because they also need to install a daemon that keeps their IP address up-to-date with their OpenDNS account.

Even though the apparent ‘adult’ categorisation of RP still made no sense, this angered me enough to write an article where I accused OpenDNS of behaving like the great firewall of China. I also posted this on the Radio Paradise forum. No other OpenDNS users could reproduce this problem however, they did not see certain websites suddenly being blocked. In the meantime I also had created an OpenDNS account, which requires registering the network you're currently on. This proved impossible: the IP address was already registered! These two facts lead to the ‘eureka’ moment and the conclusion I started this article with. I immediately reset the modem to force it to obtain a new IP and indeed: everything worked as before.

This does illustrate that maybe OpenDNS could have done a better job at detecting that someone tried to register a static network with an address that belongs to a dynamic address pool, although I do realise this is not a trivial task. I now have added my home network to the OpenDNS account and installed an update client to keep the dynamic IP in sync, but I'm not sure if this will prevent this same scenario in the future. Well at least now I know how to fix it and how I can use OpenDNS to block the things I want, like spam domains or annoying ads that slip through Adblock, so this whole story does have a happy ending.

Tuesday, 18 June 2013

Movie genres in Plex media center: huh?

This post is not so much about a quirk in a particular program. Read through to the end and you'll see that it exposes a problem with crowd-sourced internet databases in general.
A few updates ago, Plex switched to a a new database for movie genres. According to what I can find from a cursory scan, it is Freebase). Previously it used IMDb, which only had a very limited set of genres. So it was supposed to be an improvement. However, when I browse through movies in Plex I now see things like this:

If someone would pick REC or The Fly for a romantic evening with their girlfriend, things would not end well. And in what way is “black-and-white” a genre? And don't get me started about “airplanes and airports”. Apparently I misunderstood the whole point of Indiana Jones. They should have made it more obvious that it was all about Indy hopping from one airport terminal to another.
When browsing through the films in Plex Media Manager, they prove to have multiple genres and the second genre often makes a lot more sense than the first (e.g. Thriller for Sin City, Horror for REC).
I can guess how this database came into existence even by only taking a glance at the Google results page for the “freebase” thing I did not know until now. It probably works in a similar way as the good old Google Image labeler, where random people are encouraged to slap as many labels onto random images as possible. The people who have the most time for doing this are children and teenagers, whose first impression of a film like Sin City is “it's black-and-white!” (Which is wrong by the way, there is quite a bit of colour in it.) Therefore the labels that get the most weight, are the ones that these age groups consider relevant. Maybe it would be helpful if the database would keep track of what labels were assigned by what age groups. Any user of the database could then re-weight the entries according to an age group of interest.

Friday, 15 March 2013

Watch Phones, Smartwatches, and the iWatch

Recently some rumours about an Apple “iWatch” have popped up, and many people thought the idea of an internet-enabled communication device on one's wrist is something novel. Not really: so-called watch phones have existed for more than ten years. Of course, the first ones were clunky and barely went beyond prototype stage, but there have been widely available and quite usable watch phones for more than five years. Most of them originate from China, and can be bought in various places for reasonable prices. My main cell phone has been a watch phone since June 2008.

The main problem with pretty much every watch phone that currently exists however, is that it severely lacks in a certain way. And for some reason, whenever a manufacturer produces a new model that improves upon certain features, it gets worse in other features — I dubbed this “the law of conservation of suckiness”.

In a new article on my site, I shortly discuss the history of watch phones, what is good and bad about past and current models, why the concept of smartwatches that need a smartphone makes no sense from an economical point-of-view, and last but not least: a concept of a watch phone which I believe could become popular, and which I would really like to see manufactured in the near future.

One of the biggest problems is that whenever I tell someone that my watch is also a phone, they will immediately ask if it isn't uncomfortable to make calls with it and if I need to hold my arm in silly poses. The answer is: no. Yet, most current models do effectively suffer from this problem. My proposal contains a solution to get rid of this misconception. Watch phones will not become a commercial success as long as people have this Knight Rider-inspired mental image about them.

Uurwerktelefoons, Smartwatches en de iWatch

Recent zijn er geruchten over een Apple “iWatch” opgedoken, en velen dachten dat het idee van een communicatietoestel met internetverbinding dat op de pols gedragen kan worden baanbrekend was. Niet echt: zogenaamde watch phones of uurwerktelefoons bestaan al langer dan tien jaar. Natuurlijk waren de eerste modellen lomp en kwamen nauwelijks voorbij het prototypestadium, maar sinds vijf jaar zijn er wijd verspreide en behoorlijk bruikbare uurwerktelefoons. De meeste hiervan komen uit China, en kunnen op verschillende websites gekocht worden aan democratische prijzen. Sinds mei 2008 was mijn GSM voor dagelijks gebruik een uurwerktelefoon.

Het grootste probleem met elk model uurwerktelefoon dat nu bestaat is echter dat het zwaar tekortschiet op een of meerdere gebieden. En om een of andere reden vinden fabrikanten het nodig om telkens ze iets verbeteren aan een nieuw model, het te verslechteren op andere gebieden. Ik heb dit “de wet van behoud van onbruikbaarheid” gedoopt.

In een nieuw artikel op mijn website geef ik een korte geschiedenis van uurwerktelefoons, beschrijf ik de goede en slechte punten van vroegere en huidige modellen, waarom het idee van smartwatches die een smartphone nodig hebben om te werken op niets slaat, en tenslotte: een concept voor een uurwerktelefoon waarvan ik geloof dat hij wél populair zou kunnen worden, en waarvan ik hoop dat iemand hem kan produceren in de nabije toekomst.

Een van de grootste problemen is dat telkens ik iemand vertel dat mijn uurwerk ook een telefoon is, zij ogenblikkelijk vragen of het niet oncomfortabel is om te bellen en of ik mijn arm niet in een idiote pose moet houden. Het antwoord is: nee. Desondanks lijden de meeste huidige modellen effectief onder dit probleem. Mijn voorstel bevat een oplossing om dit misverstand uit de weg te ruimen. Uurwerktelefoons zullen nooit een commercieel succes worden zolang mensen er dit door Knight Rider geïnspireerde idee over hebben.

Monday, 4 March 2013

Transmission-daemon: never set peer limit to 0

In older versions of Transmission(-daemon), it was possible to set ‘peer-limit-per-torrent’ to 0 in settings.json, which would supposedly mean “unlimited number of peers”. A few versions ago however an industrious programmer found it necessary to make the limit literal and suddenly the “0” really meant “zero peers”. This caused Transmission to sit there, doing nothing, without giving any hints. It took me quite a while to figure this out. Even just a small warning in a log would have saved me a lot of precious time, dear developers…

Transmission-daemon: zet de connectielimiet nooit op 0

In oudere versies van Transmission(-daemon) was het mogelijk om ‘peer-limit-per-torrent’ op 0 te zetten in settings.json, wat dan zogezegd “onbegrensd aantal connecties” zou betekenen. Een paar versies geleden echter vond een ijverige programmeur het nodig om de limiet letterlijk te nemen en “0” betekende plots echt “nul connecties”. Dit zorgde dat Transmission op zijn gat bleef zitten niksen zonder een hint te geven waarom. Het heeft mij behoorlijk wat tijd gekost om hierachter te komen. Zelfs maar een simpele waarschuwing in een log had wat van mijn kostbare tijd kunnen redden, beste programmeurs…

Friday, 25 January 2013

robots.txt retroactively removes content from Wayback Machine

There is a fun experiment anyone can try: create some unique information and store it on two kinds of media: print it on a sheet of paper and store it in a file on a USB pen drive. Then, put both these objects on a hard solid surface, like a concrete floor. Take a hammer and hit the paper hard once. Then, hit the body of the USB pen drive with the same force. Next, try to recover the information from both media. The paper may have a hole in it, and perhaps it is impossible to read a few words. The pen drive however, is likely to be a total loss. If the silicon chip is cracked, your only chance would be to bring it to a specialized laboratory which will charge you a fee you cannot even imagine just to have a tiny chance of recovering perhaps a few words from the text.
What I am saying with this whole story is that I laugh at every advert that claims “save your old photos by scanning them with our digital photo scanner!” The easier it is to create and replicate information, the easier it also is generally to lose it. I am certain that at some point in history, there will be something like a super-sized version of that hammer, hitting our fragile digital archives. If it is bad enough, humanity will be catapulted back to medieval times and history will be a black hole starting from around the year 2000. Maybe they will believe the world really went to hell at the end of 2012. If you have something you really want to preserve, make a hard copy of it. No, make as many copies of it as possible, on all kinds of media.
Now, this whole introduction serves to illustrate how grave a certain issue is with the Internet Archive's “Wayback Machine”. The Wayback Machine is a great initiative. Its goal is to create digital archives of old websites. I once believed that once a website was archived, it would stay accessible until either the whole Wayback Machine were destroyed, or someone explicitly asked the information to be deleted. Now however I have discovered that information can disappear also in a very trivial and dumb way.
If someone places a ‘robots.txt’ file on a domain that prohibits crawlers from retrieving the domain, the Internet Archive will retroactively apply this prohibition. There is logic behind this: if someone noticed that a confidential website has leaked and has been archived in past months, this system allows to remove the archive without much fuss. The mechanism however is dumb as a brick and if a domain expires and is subsequently bought by someone who has no rights whatsoever to the original content, they can still put anything in the robots.txt to retroactively remove anything from the archive.
Proves that there are many domain name squatters who buy old domains and place a prohibitive robots.txt on the empty “for sale” page because they do not want it to litter search engines, which is actually a good thing. What is bad however is that this instantly hides the entire archive in the Wayback Machine. There is no justification for this aside from laziness of the programmers and excessive prudence. The squatter has no rights whatsoever to influence the information that was stored on the old website, he only has bought a domain name. Therefore I would greatly appreciate it if the people responsible for the Wayback Machine would implement a better way to provide a balance between legal concerns and the completeness of their valuable archive.