TIME for a WTF MySQL moment
Many people have been experiencing strange time perception phenomenon throughout 2020, but certain database management systems have been into time shenanigans for way longer. This came to my attention when a friend received the following exception in one of his projects (his popular Discord bot, Accord), coming from the MySQL connector being used with EF Core:
MySqlException: Incorrect TIME value: '960:00:00.000000'
Not being too experienced with MySQL, as I prefer PostgreSQL for reasons that will soon become self-evident, for a brief moment I assumed the incorrection in this value was the hundreds of hours, as one could reasonably assume that maybe TIME values were capped at 24 hours, or that a different syntax was needed for values spanning multiple days, and that one would need to use, say, “40:00:00:00” to represent 40 days. But reality turned out to be more complex and harder to explain.
With checking the documentation being the most natural next step, the MySQL documentation goes:
MySQL retrieves and displays
'hhh:mm:ss'format for large hours values).
So far so good, our problematic TIME value respects this format, but the fact that
hhh are explicitly pointed out is already suspect (what about values with over 999 hours?). The next sentence in the documentation explains why, and left me with even more questions of the WTF kind:
TIMEvalues may range from
Oooh Kaaay… that’s an oddly specific range, but I’m sure there has to be a technical reason for it. 839 hours is 34.958(3) days, and the whole range spans exactly 6040798 seconds. The documentation also mentions the following:
TIMEvalues in several formats, some of which can include a trailing fractional seconds part in up to microseconds (6 digits) precision.
Therefore, it also makes sense to point out that the whole interval spans 6 040 798 000 000 microseconds, but again, these seem like oddly specific numbers. They are not near any power of two, the latter being between 242 and 243, so MySQL must be using some awkward internal representation format. But before we dive into that, let me just point out how bad this type is. It is the closest MySQL has to a time interval type, and yet it can’t deal with intervals that are just a bit over a month long. How much is that “bit”? Not even a nice, rounded number of days, it seems.
To make matters worse, it appears that the most popular EF Core MySQL provider maps .NET’s
TIME by default, despite the fact that
TimeSpan can contain intervals in the dozens of millennia (it uses a 64 bit integer and has 10-8 s precision) compared to TIME’s measly “a bit over two months”. This is an issue other people have run into, and the discussion in that issue includes a “This mimics the behavior of SQL Server” remark, which made me go check and, sure enough, SQL Server’s
time is meant to encode a time of day and has a range of 00:00:00.0000000 through 23:59:59.9999999, something which overall makes more sense to me than MySQL’s odd TIME range.
So let’s go back to MySQL. What is the reasoning behind such an interesting range? The MySQL Internals Manual says that the storage for the TIME type has changed with version 5.6.4, having gained support for fractional seconds in this version. It uses 3 bytes for the non-fractional type. Now, had they just used these 3 bytes to encode a number of seconds, they would have been able to support intervals spanning over 2330 hours, which would already be a considerable improvement over the current 838 hours maximum, even if still a bit useless when it comes to mapping a
TimeSpan to it.
This means their encoding must be wasting bits, probably so it is easier to work with… not sure in what circumstances exactly, but maybe it makes more sense if your database management system (and/or your conception of what the users will do with it) just loves strings, and you really want to speed up the hh:mm:ss representation. So, behold:
1 bit sign (1= non-negative, 0= negative) 1 bit unused (reserved for future extensions) 10 bits hour (0-838) 6 bits minute (0-59) 6 bits second (0-59) --------------------- 24 bits = 3 bytes
This explains everything, right? Well, look closely. 10 bits for the hour… and a range of 0 to 838. I kindly remind you that 210 is 1024, not 838. The plot thickens. I’m not the first person to wonder about this, of course, this was asked on StackOverflow before. The accepted answer in that question explains everything, but it almost didn’t, as it initially dismisses the odd choice of 838 as “backward compatibility with applications that were written a while ago”, and only later it is explained that this choice had to do with compatibility with MySQL version… 3, from the times when, you know, Windows 98 was a fresh operating system and Linux wasn’t 10 years old yet.
In MySQL 3, the TIME type used 3 bytes as well, but they were used differently. One of the bits was used for the sign as well, but the remaining 23 bits were an integer value produced like this: Hours × 10000 + Minutes × 100 + Seconds; in other words, the two least significant decimal digits of the number contained the seconds, the next two contained the minutes, and the remaining ones contained the hours. 223 is 83888608, i.e. 838:86:08, therefore, the maximum valid time in this format is 838:59:59. This format is even less wieldy than the current one, requiring multiplication and division to do basically anything with it, except string formatting and parsing – once again showing that MySQL places too much value on string IO and not so much on having types that are convenient for internal operations and non-string-based protocols.
MySQL developers had ample opportunities to fix this type, or at the very least introduce an alternative one that is free of this reduced range. They changed this type twice from MySQL 3 until now, but decided to retain the range every time, supposedly for compatibility reasons. I am struggling to imagine the circumstances where increasing the value range for a type can break compatibility with an application – do types in MySQL have defined overflow behaviors? Is any sane person writing applications where they are relying on a database type’s intrinsic limits for validation? If yes, who looked at this awkward 838 hours range and thought of it as an appropriate limitation to carry unchanged into their application’s data model? At this point, I don’t even want to know.
Despite having changed twice throughout MySQL’s lifetime, the TIME type is still quite an awkward and limited one. That unused, “reserved for future extensions” bit is, in my opinion, really the pièce de résistance here. Here’s hoping that one day it will be used to signify a “legacy” TIME value and that, by then, MySQL and/or MariaDB will have support for a proper type like PostgreSQL’s INTERVAL, which has a range of +/- 178000000 years and a very reasonable microsecond precision.
See the comments on this post on Hacker News