tny.im and this website were down for about 42 hours, starting on June 29 at 03:16 UTC.
The problem? BlueVM’s S19-NY server went down, taking with it the server I have/had there (and which I paid for a full year!). Other than this outage, the service had worked fine for three months, – fast network, full resources availability – since I bought it.
S19-NY is still down, without any ETA for it to come back. There’s no information in what conditions it will come back, or *if* it will come back (with the previous contents, at least). BlueVM staff is pretty much unresponsive, other than a guy who sometimes hangs on IRC and says he can’t fix the KVM instance because he doesn’t have access to it.
Of course, I no longer recommend BlueVM and I don’t plan on renewing the server I have with them.
The “solution” to put an end to two days of downtime, was to buy the cheapest SecureDragon OpenVZ server, (OpenVZ! so hard to live without my beloved KVM, I can’t even use davfs2 because there’s no fuse module!) and restore the backups I had (from four hours before the BlueVM instance went down). This has been done, except for the HTTPS certificate of tny.im… that alone is another story:
As I tried to retrieve the existing cert from StartSSL (because, stupid me, automated backups were not copying everything SSL-related, and I didn’t save it locally), I found my authentication certificate had expired. This basically means my StartSSL account is lost, unless I create a new account and ask their staff to link it to the old one. They probably won’t do that without a payment and some ID checks so… out of question. I guess, if the BlueVM server doesn’t come back, that I’ll just create a new StartSSL account and generate a new cert for tny.im. There’s no security issue with this, as the previous certificate has not been compromised (unless BlueVM is collecting certificate private keys from inside their clients’ machines…) and so it doesn’t need to be revoked.
To conclude the HTTPS point: tny.im has the HTTPS service unavailable for now, until I can retrieve the existing certificate from the previous server, or until I get a new one.
Is all the fault in BlueVM’s side? Of course not… I could have lost my love for the money earlier, bought the SecureDragon VPS yesterday already and reduce the downtime by 24 hours. But since I had hope the problems on BlueVM support were just Sunday-related, I thought that by Monday they would have it fixed. They didn’t.
On related news, I’ll take this downtime and new server acquisition as the motivation for setting up a new advanced and redundant system, so that if one server goes down, tny.im (and possibly this blog too) will continue to operate as normal. I have two servers already (assuming the BlueVM one comes up), and I plan on developing a system where firing up new instances of tny.im on any empty server will be really easy. The system will be always prepared to lose any server at any point, without data loss, and restore full service within five minutes. That way, I can add less reliable hosts, perhaps even VPS trials, to the redundant system. This also allows for scaling the service as needed. Sounds ambitious? Of course it won’t happen in a week, but I have the full summer to develop and test the system…
Why don’t I just go with some SaaS that supports scaling? Two reasons: the price is too high, and the tny.im software is not coded in a way that’s compatible with these services. Let me remember you, that while not exactly being a CGI script, tny.im is not coded in one of those fancy modern languages, and even though PHP is not exactly outdated or unmaintained, the quality of the code can make it pretty horrible or pretty good. And the code is… not perfect – it doesn’t use any popular framework, it is based on YOURLS and has many, many
hacks feature additions, plus a… very close relationship with the database.
Let me finish by saying that downtime of this kind is something to be expected if I were still hosted by FreeVPS and the like. But believe it or not, on FreeVPS and other sponsorships I’ve never seen a customer service as bad as the one of BlueVM (and it’s hard to remember an outage as big as this one). It is definitely not adequate for a paid service. In addition to S19-NY, they have many other instances down, with similar or worse downtime. The admins don’t appear to be online or reachable in any way, even by other staff members. The latest news/excuse on the S19-NY situation is that IPMI is broken and they are waiting for the provider to fix it… now tell me, does this look like a serious company, or some poor-man’s sponsorship?
EDIT: The BlueVM server is still down. tny.im is now hosted by three servers with round-robin balancing. HTTPS service was restored with a new certificate.
EDIT 7/7/2014: I forgot to update this post in time, but the BlueVM server has been up since three days ago. But I only got to know that the service was restored thanks to a friend of mine, because they didn’t reply to my ticket to inform me about it. Anyway, I don’t plan to renew this server, and BlueVM lost me as a customer (except for some really cheap deal which I’ll use as a personal/development box, and never in production).
On related news, Mirasm – the Tiny Server Redundancy Manager – is mostly finished, only needs some more testing to be put on production servers, managing the new tny.im redundancy system.
Looks like >90% of the Internet users spend >90% of their WWW-using time on the same 0.001% of websites.
It’s no wonder why plans for a truly decentralized Web never seem to get much traction… how could them, in a time when users’ habits are truly centralized? In fact, I believe the main problem is on the way users use the Web, not the technology iself.
As for the Internet, it is already (comparatively) decentralized.
Do users always know what’s best for them? The past has proven that is not the case…
Just in case you didn’t know/notice, this website and tny.im are no longer hosted on a VPS provided by FreeVPS. Due to issues on the node where that VPS was hosted, that made it become network-unreachable, I needed to buy a new server to avoid a longer downtime.
I ended up buying a KVM-powered VPS from BlueVM because they accept Bitcoin and had a good special deal at the time. It’s my third day with the server and so far, so good. It has better specs than the server I had, and after config optimization tny.im is now able to handle more concurrent requests than it did before. I will keep readers updated and post a detailed review of the server, later.
Moving away from sponsorships is be better in the long term (less worries, proper support…), even if it means that tny.im is now unprofitable (temporarily, hopefully).
Regarding the problem of Casio Prizm calculators that hang on the power-off screen and then die when rebooted, I have the feeling that we [the Casio programming enthusiast community] have discussed this subject way more than Casio ever did – in the end, it may be easier and cheaper to fix or replace, say, 1% of the calculators they sell, than to pay lots of working hours to discuss and fix hard-to-reproduce bugs, delaying the production of newer models (in the end, nowadays people buy a calculator because it’s fancy, not because it will survive the apocalypse ).
To all the users with dead calculators… I know your situation is bad (especially if you didn’t have backups of the data in it), but just send the f***ing thing for repairs before the three year warranty expires. Make sure to explain how it broke, maybe they will fix it some day. And when the warranty expires, if you still need a graphic calculator, well, buy a new one (eventually not from Casio?).
Is it just my impression, or nowadays many people with smartphones, who also have a data-capped connection at home, conserve their list of apps to update until they go to a friend’s house, find free (or school/work) WiFi, or otherwise find means to avoid having automatic updates use data on your Internet connection?
Personally, I put all of my machines to update on the last two days of the month, to avoid that situation where my stupid ISP limits the connection speed to 128 kbit/s, after I use 15 GB of data in a single month (if I can’t avoid the situation, at least the speed will only be ridiculously low for at most two days)… yeah, that’s the state of 3G data plans in Portugal (across all ISPs).
Note to self:
By default, Linux Mint brings OpenDNS (which I hate, if not for anything else, for the NXDOMAIN response it gives) as the resolvconf fallback. Deleting /etc/resolvconf/resolv.conf.d/tail as root and then restarting the resolvconf will (re)solve it. Do not waste time trying to figure out where the DNS is being hijacked on your network: the thing is right on your machine, even if you have overridden the DHCP DNS configuration on the network configuration dialogues – out of the box, it will always use OpenDNS if the DNS servers you or DHCP specified, are not available/don’t answer fast enough.
By the way: http://myresolver.info/ helps with debugging DNS issues.
This post brings to an end months of trouble, and me thinking my ISP, to add to the fact that even though my plan is unlimited, severe speed limiting starts at 15 GB of data usage, was also hijacking DNS queries to their own system. Fortunately, looks like it is not the case. At the same time, I’m disappointed with the choice of DNS fallback by the people behind Linux Mint.