The stack powering tny.im: goodbye redundancy! The end of an era

I was casually going through my GitHub repos and came across PicoRed, a server redundancy manager I developed, with the immediate goal of managing the DNS records for the tny.im domain. The tny.im shortener used to be hosted by multiple servers in a Round-robin DNS configuration. The idea was that as servers went online and offline (or underwent maintenance, etc.), the DNS records would be automatically updated to reflect which servers are currently serving a service, in this case tny.im.

PicoRed is the successor to mersit, which served the same purpose but was written in very unidiomatic Python and was much clunkier than PicoRed (which is written in very unidiomatic Go, but used fewer resources and was somehow more stable). PicoRed and mersit were completely peer-to-peer, and this is because I couldn’t afford to have a “master” server that was stable enough and which I could be sure would have three nines of uptime.

The idea behind those tools is everything but novel; container orchestration, for example, requires similar tools to be deployed. For some reason, perhaps ignorance, back then I decided to write my own. (For an example of mersit/PicoRed done right, see Serf). I don’t regret it, of course: I learned a lot about distributed systems, and while my terrible consensus “algorithms” (a complete joke) worked, they taught me why things like Paxos and Raft had to be invented. The main takeaway was, “it’s complicated”. So for PicoRed I decided to use a library by Hashicorp that handled the hard parts for me (and that’s how my unidiomatic Go program was “somehow more stable”).

Three paragraphs into this post, and I’m still writing the introduction… these three paragraphs about distributed systems are just warming up for what’s coming, which is me saying that none of those homemade tools are in use anymore, and it’s not even because I switched to something better: tny.im, and some other services of the TNY network, are now served by a single server.

How did we get here? Back in 2014, I was a huge proponent of distributing every single service across many cheap servers, instead of buying a proper, rock-solid, big and expensive server from a reliable company. In theory, horizontally scaling would let one handle big amounts of traffic and improve availability at the same price, or even less – sounds great, right? These strong opinions were backed by the issues I was having with my BlueVM server. But now, we’re back to zero redundancy… what changed?

Well, my opinion is still the same: I’ll take horizontal scaling over vertical scaling any day, and the more redundancy that’s fit to pay, the better. The problem is when horizontally scaling begins to hurt performance and reliability instead of helping it, and that’s exactly what was happening in our case.

tny.im, dotAccount, PrizmID and my WordPress websites (this blog and the TNY network website) are powered by an extremely uninteresting LEMP stack. A LEMP stack is one composed by Linux, Nginx, MariaDB and PHP, or in other words, a LAMP stack but with Nginx instead of Apache. Until a few weeks ago, the “M” in this stack had the peculiarity of actually being MariaDB configured in master-master replication mode. What this means is that MariaDB was running on multiple servers, managing the same databases, and whenever a change was made, it was propagated to all of the other servers in the cluster (up to a few weeks ago, two servers; at some point in the distant past, up to five servers were used).

That’s how tny.im was served by multiple servers: simply by running the same PHP code in all servers, and having that code talk to the same database, replicated across all the servers. Of course, MariaDB master-master replication has its disadvantages. For one, performance is worse, because all database writes involve communication between the different MariaDB servers. This began to show on more database-intensive applications like dotAccount.

Perhaps more surprisingly, reliability is also worse. Perhaps I didn’t have MariaDB replication properly configured (after many attempts and hours spent, trust me), but it would sometimes break in wonderful states such as “WSREP has not yet prepared node for application use” whenever there was some network hiccup. This could happen as often as once a day, or once every two months (yes, networks are unpredictable like that). Whenever it broke, it would need to be manually restarted, and it would sometimes take multiple attempts until all the servers had their MariaDB running. In other words, exactly the opposite you want for a reliable system that requires minimum amounts of human supervision.

Perhaps PicoRed could have expanded into taking care of restarting the cluster, but since I couldn’t even get to a sequence of commands that, when executed on all servers at the right times, would reliably restart the MariaDB cluster, I kind of gave up. Lack of time and more interesting projects to develop, like Clouttery, meant that some stuff would inevitably get left behind, and my horrible mess of code called PicoRed ended up forgotten and eternally unfinished. Moving to proper solutions like Serf also required time that I didn’t have.

A few months ago I was notified that the provider of one of my VPS was closing, and all servers would be shut down by December 4th. I bought a new server, moved the stuff that wasn’t hosted anywhere else to it, but I really didn’t feel like reconfiguring the MariaDB cluster and PicoRed for the new server. PicoRed, in fact, stopped working in one of my servers (the one that wasn’t getting shut down) with some binary incompatibility error, a year or so ago. So I kind of gave up… reconfigured MariaDB so it stopped being a cluster, got rid of PicoRed, and said goodbye to one of the servers.

The new server is PHP and MariaDB/MySQL-free, and this probably won’t change. I would really like to move on from PHP and MariaDB to better languages and DBMSs. My main conclusion from the whole replication story is that MariaDB is not really prepared to scale horizontally, at least not without a lot of effort and “baby-sitting”.

I certainly have not given up on horizontal scaling, but I think that from on now, it’s best that I manage scaling at the application level instead of the database level, or alternatively, use a DBMS that was designed with horizontal scaling in mind, from the start. For the second option, it’s unfortunate that both CockroachDB and TiDB are still in a very premature state for production use.

I would rather not give up on relational databases; while it’s true that other types of database also cover some of the use cases of relational ones, I’m yet to know of any problem other than document storing that can’t be effectively solved with relational databases. (And for document storing, may I interest you in a relational database coupled with this strange thing called a filesystem?) Commercial solutions are obviously out of reach for me: it’s not like Segvault is a money-making machine; tny.im isn’t even profitable, despite all the ads!

I have grown to hate the mess of PHP and SQL that is tny.im so much, that shutting down the service (or at least getting it into a “read-only” mode) was once a topic for discussion at one of the TNY network meetings – all three of them. By the way, Segvault/TNY network is “hiring”, i.e. looking for new members with exciting project ideas, and if you had the patience to read this post this far, you may be a good candidate – contact me somehow.

Ads at tny.im earn me pocket change, that is used to offset the cost of the servers and domain names, and this shall be enough motivation to keep maintaining tny.im and supporting its users for some more years, updating MariaDB one version at a time.

500 days later: Windows 10, revisited

Today, 500 days have passed since the initial release of Windows 10. I quickly and unscientifically reviewed it right after it was released, in two blog posts: one mostly complaining, and another mostly praising it (as if I was seeking some sort of redemption). The former one was a huge success, if we take into account the readership numbers for this blog. That post accumulated over 30 thousand views in the few hours after its publication – and a month later, we were back to our usual readership stats of approximately zero views per hour.

But don’t get fooled by these yuge numbers; I’ve probably spent more hours of my life using Linux than Windows, which probably means my opinion on the latter actually isn’t worth shit – but don’t worry, as I’ve got this covered: studies show that this valuation falls in range with that of most people writing on popular tech news websites! The difference is that these usually spend their days looking at press releases and lesser-known tech news websites and blogs written by even-lesser-known people (totally not the case here) to repost find sources for their original pieces, while I usually spend my days going through computer engineering courses, building useless shit like Clouttery and answering tny.im support requests.

Since my 500-day-old posts have published, a lot of things have changed in the way I use Windows. Most importantly, my main Windows machine is no longer a Chinese Crapstore 7-inch tablet, but a proper Surface Pro 3 which I bought with a relatively good discount in October of 2015 right as the Pro 4 was being released. This means any problems I experience with Windows, I can no longer blame them on anyone other than me (the luser) and Microsoft (since the software is theirs, and the hardware is chosen, assembled and shipped by them). Oh, and drivers. On Windows, it’s always drivers.

After getting the Surface I started to use Windows much more, and in the last few months there have been many days during which I didn’t touch my more powerful Linux desktop [1]. I mostly use the desktop for coding, compiling big code trees and running heavier programs, but I do all of my note-taking (OneNote!), light web browsing (redditting and hackernewsing, for instance) and ssh-ing into Linux servers using the SP3, which means that when there isn’t more than this to my day, I don’t even turn on the desktop. I bought the Type Cover 4 a few months after buying the SP3, which certainly contributes to how much I use it.

Still, and after these major changes to the way I use Windows and how often I use it, I thought it would be interesting to go through all of the complaints in my extremely popular post from 31st July 2015, the day Windows 10 was released to the general public as it (supposedly) went out of beta, and check what was done about each of them. In a completely unscientific way, obviously, matching the standards this publication has accustomed you to.

As with my older posts, I will focus on the desktop edition (i.e. x86, i.e. the version that can run Win32 apps, i.e. the version that is actual “Windows” and not Windows Phone or Mobile or whatever they call it this month). It was announced in September 2014, skipping Windows 9 for reasons that, after many theories, may be best described as “yes” [2]. After multiple “preview” releases, it was made available “for real” on July 29, 2015 (this was version 1507, build 10.0.10240). Two days later, I published my “Windows 10 is unfinished” post, completely written and produced with my Windows 10-powered 7 inch tablet, sans-physical-keyboard. For this post, however, I’m not a masochist anymore – some parts have been written in my Linux desktop and others on the Surface with the type cover.

On November 12, 2015, the “November Update” was released (version 1511, build 10.0.10586), and I didn’t write a blog post because… well, I had more interesting things to do than I have right now (or rather, I was not procrastinating in writing). For the purposes of this post all you need to know is that this fixed a big number of stability and UI problems, but nothing too revolutionary. More recently, on August 2, 2016, the “Anniversary Update” was pushed to the production ring (version 1511, build 10.0.14393). This one was a bit more “revolutionary”, especially in terms of UI and feature changes, but as you’ll find out later… deep down, the important issues are yet to be solved.

There’s an upcoming “Creators Update”, but we are going to pretend that doesn’t exist and instead focus on the Anniversary Update, released about four months ago. Also, I can’t compare resource usage, as I’m no longer using the same hardware (even though I still own the smaller tablet and use it from time to time, it’s much, much less frequent now). Let’s begin.

The touch experience is still possibly worse than on Windows 8.1, but by now I’ve gotten used to it (also, with the Type Cover 4, I can use the excellent trackpad, so I no longer have the touchscreen as the only input device…). Annoyingly, the touch keyboard still doesn’t dock “properly”. This means that docking it doesn’t resize the whole desktop (forcing all windows to fit above the keyboard) like it did back in Windows 8.1. In many legacy apps which haven’t been adapted to “run away” from the touch keyboard, the cursor will still often be below the keyboard. Even on many parts of the Windows UI, this happens, and on parts which clearly have been adapted to avoid getting their inputs obscured by the keyboard, it’s still buggy as hell.

Funnily enough, some apps (like OneNote) can do this “desktop resizing” trick. So some months ago, I looked up the necessary APIs and in a few hours had a .NET proof-of-concept application that could resize the desktop, too. So it’s not a matter of missing APIs or compatibility problems – it’s certainly a deliberate design decision, and one I can’t understand. Perhaps it’s to encourage moving to UWP? I don’t even know anymore. If I still had to use the touch keyboard as my only keyboard for extended periods of time, you can be pretty sure that by now I’d have written my own “desktop resizing” helper that does its magic when the keyboard is docked.

The “void” between good old Win32 “classic” apps and the “modern” UWP (and that Windows 8 and 8.1 framework they don’t like to talk about anymore, but was the pre-UWP that powered Windows 8 apps) is even more reduced, with some bugs (for example, small details in behavior and looks between the “modern” and “classic” windows) getting fixed and more parts of the OS getting refreshed with a modern look (that, don’t let your eyes fool you, are often not actually built with UWP; it’s all just a matter of design language, the tech is still the same).

One can still find remnants of the multiple design languages used by the OS throughout its decades of history. Paying off this kind of technical debt, which I call “UX debt”, would surely go a long way towards giving Windows that polish it is often criticized for not having. You can still find icons from the Windows 98 and earlier era, from the XP era, from the Vista and 7 era, and even though it’s less noticeable (since it’s all “flat design”) you can also find some stuff that was designed back in the Zune and 8/8.1 days and was forgotten since then (now that I think of it, though, that happens mostly with Microsoft software other than what’s bundled with Windows). But pictures are so much better than text for this, so let’s mimic the iconic “two control panels” screenshot:

Not much left to comment here… oh wait, this is still a thing:

Sure, you may argue this is a fairly minor thing, doesn’t hurt anyone, doesn’t decrease system stability, and realistically doesn’t hurt anyone’s productivity. But, Microsoft, if you keep pushing your main product through deadlines without ever slowing down to fix “lower priority” stuff, the “UX debt” will keep increasing… damn, at some point you’ll lose even to “unpolished” Linux environments, if nothing else because some of these tend to throw everything away every three years and so have no “UX debt” to speak of (only bad UX, but no debt – it’s like being poor, but at least you don’t owe anyone money).

Now, the part which I personally find the funniest, and is an actual annoyance which certainly hurts productivity. To their credit, this had a major improvement with the Anniversary Update:

Left out of this image is the even wider assortment of context menu styles used by Microsoft’s own apps, like Office and Visual Studio. There are still different styles but the differences are now mostly between light and dark menu themes (which makes sense: dark elements produce dark menus). There’s also that giant menus situation, which supposedly should only occur when you use your fingers to open them. Problem: they sometimes show up when you use the mouse or a pen…

Don’t let me begin on the clusterfuck that are the network settings… I find myself constantly switching between the wireless networks popup in the taskbar, the “Network and Sharing Center” (old control panel page) and the Settings page to find the things I need. A particularly fun exercise I recommend to all readers, is getting to the dialog where you can retrieve the password for the Wi-Fi network to which you are currently connected. It’s especially fun if you have seen it before, and have a vague idea of where it is and how it looks like… and yet I always find it hard. Do you?

So, in terms of UI, it’s still far from perfect – but getting better. At this rate, I expect everything to be migrated to the “Windows 10 look and feel” in four or five years. Not a big problem, except for the fact that Microsoft, too, likes throwing out much of their work every three or four years (except they can’t throw much away, because of backwards compatibility and user training; things never get fully updated, and then you get “UX debt”). Let’s see how this works out.

There’s also the whole privacy concerns/telemetry topic, which I have skipped in this post. Not that it’s not important, but there’s so much to talk about that, it’d be better suited for whole another post on the privacy policies of most software-as-a-service (as some friends will be keen to point out, my own privacy policies – or the lack of them – included!). And the phenomenon of software that needlessly becomes software-as-a-service is vast enough to warrant another post, too… posts which I’ll most likely never write. But hey, just use your favorite search engine to find people with skeptical opinions on these topics. If you want a head-start and eventually some laughs, you can start with this or this. Guaranteed hours of endless fun and/or fumbling!

I also masterfully skipped over the “Windows updates are still stupid and require lengthy reboots 90% of the time! Why? Because it’s 2016 and Windows locks that way since the 90’s, that’s why!” subject, as well as the “Windows ate my settings and brought back Candy Crush!” one. Phew.

Now for the praising bit. Some of the stuff Microsoft is doing with Windows is particularly exciting. The Windows Subsystem for Linux, for instance: if it doesn’t appeal to you from an ethical/moral/religious perspective (and yes, I feel uncomfortable too), you have to agree it’s pretty interesting technically. And very recently they showed what appears to be a comeback of Windows RT, except this time it’s done right: x86 binaries running on ARM processors thanks to ISA translation. It’s not the fastest, and I suppose it only became usable (at least for “heavier” software like office suites and image editors) with the current generations of ARM processors (and perhaps more specifically, Snapdragon SoCs only). However, had they implemented this back in the Surface RT, perhaps not even as a marketing item but as a “nice to have” thing power users would find out about (just to find out it was dog slow), and the fate of that line could have been slightly better.

The Creators Update will bring more “nice to have” improvements and hopefully fix more of the still present UX issues, and add major features (but I’m not sure I care that much about these anymore, I would rather have more polished versions of what we have now). I bet: getting to the dialog where you “unhide” the wireless password will still be hard.


[1] This “desktop” is actually a Toshiba A660 laptop with a 1st-gen i7, 8 GB of RAM and a SSD upgrade, but the battery’s dead – it was never good to begin with. A machine that’s at least five years old and still rocking – unlike some would put it, it’s not ‘sad’ to be using it. Fortunately, I often can’t hear these people over the sounds travelling through the headphone jacks of my machines. I’m using Arch Linux, KDE and headphones and I’m quite happy with this setup, thank you.

[2] “Yes” is also the reason that’s more in line with the official reasoning, which, to quote Terry Myerson (an important guy at Microsoft), goes like this: “based on the product that’s coming, and just how different our approach will be overall, it wouldn’t be right to call it Windows 9”. Or to put it simply, “yes”.

The more base-2 oriented readers may be wondering, why post this 500 days after the release of Windows 10, missing the opportunity to post it 512 days after? The reason is that Windows 10 will turn 512 days old on the 24th of December, and you’ll probably have more interesting things to do that day than caring about this silly post – not that anyone cares on the other days of the year, anyway…