Since tracking down a really annoying IE 8 issue last week, I’ve become increasingly observant of the number of bytes involved in the code I’m troubleshooting. For quite some time now (several months), I’ve been getting a number of 404 errors which appeared to be scrambled strings, often containing html. The only thing they had in common was that Trident/4.0 appeared in the User-Agent. (Trident/4.0 is what identifies IE 8, even if the User-Agent says IE 7 — which is IE 8 running in compatibility mode.) As it turned out, IE 8′s Lookahead Downloader has a quirk that affects url’s that span any 4096th byte. The resulting url generated is a concatenation of the string immediately preceding the cutoff and the string immediately following it. It appears to treat any single or double quotation marks as delimiters. Fun issue. Not really — if you use XHTML, there’s no practical way to avoid getting these 404 errors from IE 8 users.
Several days later, I was troubleshooting an issue that also made little sense. There were two errors — one was an unclosed comment, and the other was an unexpected end of file. I went to the line in question and saw nothing abnormal that would be causing the issue. The page that errored is one that gets automatically re-generated, and the only possible scenario I could imagine was that the page must have been included as it was being written, and so the error took place because it had only been partially written when included. Made sense — seemed possible — but it was at most a theory. However, on a hunch, I checked the filesize of the page and noticed it was 69 KB, which made me excited. Sure enough, the error occured towards the bottom of the page. I copied the source code after the line the error occured and measured the number of bytes — it was right around 5 KB. At first, the most I had was a theory — a possible scenario, but with little support other than that I had no other explanation. Once I was able to demonstrate that the error occured 64 KB into the page, I had a case. Some numbers just aren’t coincidence.
Finally, a couple hours ago, one of my dba’s came by to tell that me a bunch of records were truncated after running my import process. I knew of no immediate reason why my script would have truncated any strings. I asked him whether his columns were able to hold the full length of the string, and he confirmed that they were. He said they were getting chopped off at “inf”, just a few words from the end. He hinted that my import script might be messed up, which I didn’t think was very likely. First thing I did was paste the string (up to “inf”) into my editor and check the length. It was 100 characters. Just too good a number to be coincidence. At this point, I found the procs that were responsible for saving the values to his tables. Sure enough — varchar(100).
Nothing terribly profound, but being aware of the number of bytes involved where issues occur can help confirm theories and be a helpful push in the right direction. In that last example, if the string had been truncated at 89 characters or 113 characters, I doubt I would have looked in the import proc first — I would have been scanning my code for string manipulation functions to see if something else could have chopped off the end. Strange, random number of characters, and I’m looking for a messed up regex; even, meaningful number of characters — I’m looking for something that truncates at a given number of bytes.