TL;DR: SpinRite 6.0 contains a recently discovered division instruction which will probably overflow when DynaStat engages on any drive larger than ~549 gigabytes. While this will never damage user data, it will halt SpinRite with a "Division Overflow" notice. A patch utility has been created (click that link) to correct this issue for v6.0 until v6.1 is available.
For those who do not follow my weekly Security Now! podcast, here is the text from my full explanation of how this was discovered and resolved with a temporary fix:
It turns out that my claim that SpinRite 6.0 has had no known bugs for the past 19 years has not been correct. Though I suppose it would depend a bit upon how you define “known”. Certainly it hasn’t had any “appreciated” bugs, but it definitely has had a bug. And thanks to the work of an independent coder named Paul Farrer, GRC is now offering a pair of patch utilities which fix this bug that SpinRite has had, perhaps since the mid 90’s with SpinRite 3.1.
Since it only occurs on drives larger than 549 gigabytes, when SpinRite’s DynaStat system kicks in to perform data recovery and repair, and since this behavior has been present since at least SpinRite 3.1, what is now seen as an overflow was very likely my deliberate design decision at the time, since drives of that size were not even a dream back when 50 megabytes was a large drive. So what likely happened was that as I evolved SpinRite through the decades, I never revisited the parameters surrounding this one division operation to notice that modern drives might cause it to overflow.
Through the years, we’ve had reports of SpinRite halting with a division overflow error. Somehow, I got it into my head that the location reported by SpinRite was the segment where the error was occurring, not the offset. So “B04E” would not be in program space. It’s in the region of memory that was once set aside for the monochrome display adapter. So I assumed that this error was occurring in a chunk of code that the system’s BIOS had mapped into that unused region. And this belief was supported by the fact that GRC’s tech support guy, Greg, has developed a collection of workarounds for SpinRite’s users who encounter this error, things like “Try running SpinRite with that drive on another machine” – which he says often works. In fact, over the weekend, when I wrote to him to tell him that we had a patch for this long standing problem he replied: “Since we are getting closer to 6.1, I'll probably use this as the last thing to try as all the other "fixes" we have in place are much less technical.” So my point is, this hasn’t been a big deal or issue for us. But I know for certain that it has been so for some users – and that’s not okay. I also now understand why moving to a different machine may have helped, since part of the issue surrounded the BIOS’s mapping of cylinders, heads and sectors to a drive’s linear sector number, and that mapping is one of the many things that different BIOSes might do differently.
In any event, Paul had just finished developing a different patch for those buggy AMI BIOSes which we discovered were blasting main memory when anything attempted to access sectors past 137 gigabytes on USB-connected drives. Out of an abundance of caution, which I feel is warranted, SpinRite v6.1 will refuse to go any further than the first 137 gigabytes on any USB drive. But Paul had access to our newsgroup which was full of people who had machines with these buggy AMI BIOSes. So working with them, he’s created a tiny utility that can be run before SpinRite. If it finds an AMI BIOS that it recognizes, he’ll patch it. And then, with my blessing – we’ve discussed how this should be done to be stable – his utility will remain in RAM and cause SpinRite 6.1 to believe that USB devices are SCSI devices, thus lifting 6.1’s cautionary clamp on USB drive size. So that little utility will be made available for users to use at their own risk if they choose. SpinRite 7 will not use any BIOS, so all of this will be going away as soon as we move there.
After doing that work, Paul became curious about that B04E error. Without my bias of assuming that this was the segment of the problem, and therefore in the BIOS and not in SpinRite, he assumed that I had been reporting the offset – as I was. So he looked into SpinRite’s running, in-memory code and sure enough, he found a division instruction at that offset in SpinRite. He then proceeded to reverse engineer that region of SpinRite’s code to figure out what was going on and why. At one point I provided him with the relevant chunk of SpinRite’s source code so that he could be sufficiently confident that he knew what was going on. He had it exactly right. Drives had become larger, and the math that I had not revisited for decades, which decomposed a linear sector number into cylinders, heads and sectors for the BIOS was no longer able to handle today’s larger drives.
So Paul has produced another patch utility which fixes this problem for SpinRite 6.0. He created both a DOS driver that can be loaded through CONFIG.SYS and a DOS TSR that can be run before running the current SpinRite 6. After testing it thoroughly he provided me with his source code to review, and it was immaculate. So, thanks to his efforts, we have a patch for this bug that’s always been in SpinRite since its very early days. Out of curiosity, I checked SpinRite’s source code for 5.0 dated February 11th, 1996 and it’s the same code that SpinRite 6.0 is still using. So I never changed it for SpinRite 6.0 since, at the time, it was not a bug. But it is today. Hitting this error, which can only occur on drives over 549 gigabytes – and only when DynaStat engages – does not endanger or damage any of a user’s data, but it does mean that SpinRite cannot proceed with its recovery and repair. So my advice to all SpinRite users listening would be to grab Paul’s patch and to add it to SpinRite’s boot media. The file is called MDFYSR60.ZIP (that's a direct link to the file) and it’s in GRC’s freeware collection. It’s also now referred to near the top of the SpinRite FAQ and it has a menu entry under GRC’s main menu under SpinRite and “Knowledgebase: B04E”.
As I was writing this up for today’s podcast I suddenly became curious about what the code for this looks like now, since we already know that SpinRite 6.1 doesn’t contain this bug. We’ve all been running lots of DynaStat recoveries on lots of multi-terabyte drives without any trouble. So I was pleased to see that I had completely replaced that old code with a new routine which is capable of handling a full 64-bits worth of sectors. That’s 18,446,744 terabytes... so, even though releases of SpinRite tend to live for a long time, we should be good for a while.
Since this bug is already long gone from SpinRite 6.1, since 6.1 is so vastly superior to 6.0, is nearly finished, will be free to all 6.0 owners, and since this is never destructive to any SpinRite user’s data, I plan to get 6.1 finished, rather than delay it in an attempt to announce this patch to v6.0’s current owners. After I put all this online Sunday night, I updated Greg so it’s what he’ll be pointing anyone to who encounters this problem to the patch. And when I do announce 6.1, I’ll also inform all 6.0 users of this patch so that they’ll have it for 6.0, even though it will largely become obsolete, as will 6.0. So, a big public thanks to Paul Farrer for his terrific work on this.
One final point: Something else I had forgotten from 19 years ago which recently came to light was that for some weeks after SpinRite’s initial public availability I was still finding and fixing some final bits of debris. And I was updating SpinRite’s downloadable code on the fly. I didn’t have the mature version-stamping system that all of my recent work carries, and which SpinRite 6.1 will, so all of those early editions just say 6.0 without any indication of any sub version or build number. Nothing has changed in 19 years. But if you believe that you may have downloaded SpinRite 6 within week’s of its first release in 2004, and never again since then, you might want to update your copy until 6.1 is ready.
For those who do not follow my weekly Security Now! podcast, here is the text from my full explanation of how this was discovered and resolved with a temporary fix:
It turns out that my claim that SpinRite 6.0 has had no known bugs for the past 19 years has not been correct. Though I suppose it would depend a bit upon how you define “known”. Certainly it hasn’t had any “appreciated” bugs, but it definitely has had a bug. And thanks to the work of an independent coder named Paul Farrer, GRC is now offering a pair of patch utilities which fix this bug that SpinRite has had, perhaps since the mid 90’s with SpinRite 3.1.
Since it only occurs on drives larger than 549 gigabytes, when SpinRite’s DynaStat system kicks in to perform data recovery and repair, and since this behavior has been present since at least SpinRite 3.1, what is now seen as an overflow was very likely my deliberate design decision at the time, since drives of that size were not even a dream back when 50 megabytes was a large drive. So what likely happened was that as I evolved SpinRite through the decades, I never revisited the parameters surrounding this one division operation to notice that modern drives might cause it to overflow.
Through the years, we’ve had reports of SpinRite halting with a division overflow error. Somehow, I got it into my head that the location reported by SpinRite was the segment where the error was occurring, not the offset. So “B04E” would not be in program space. It’s in the region of memory that was once set aside for the monochrome display adapter. So I assumed that this error was occurring in a chunk of code that the system’s BIOS had mapped into that unused region. And this belief was supported by the fact that GRC’s tech support guy, Greg, has developed a collection of workarounds for SpinRite’s users who encounter this error, things like “Try running SpinRite with that drive on another machine” – which he says often works. In fact, over the weekend, when I wrote to him to tell him that we had a patch for this long standing problem he replied: “Since we are getting closer to 6.1, I'll probably use this as the last thing to try as all the other "fixes" we have in place are much less technical.” So my point is, this hasn’t been a big deal or issue for us. But I know for certain that it has been so for some users – and that’s not okay. I also now understand why moving to a different machine may have helped, since part of the issue surrounded the BIOS’s mapping of cylinders, heads and sectors to a drive’s linear sector number, and that mapping is one of the many things that different BIOSes might do differently.
In any event, Paul had just finished developing a different patch for those buggy AMI BIOSes which we discovered were blasting main memory when anything attempted to access sectors past 137 gigabytes on USB-connected drives. Out of an abundance of caution, which I feel is warranted, SpinRite v6.1 will refuse to go any further than the first 137 gigabytes on any USB drive. But Paul had access to our newsgroup which was full of people who had machines with these buggy AMI BIOSes. So working with them, he’s created a tiny utility that can be run before SpinRite. If it finds an AMI BIOS that it recognizes, he’ll patch it. And then, with my blessing – we’ve discussed how this should be done to be stable – his utility will remain in RAM and cause SpinRite 6.1 to believe that USB devices are SCSI devices, thus lifting 6.1’s cautionary clamp on USB drive size. So that little utility will be made available for users to use at their own risk if they choose. SpinRite 7 will not use any BIOS, so all of this will be going away as soon as we move there.
After doing that work, Paul became curious about that B04E error. Without my bias of assuming that this was the segment of the problem, and therefore in the BIOS and not in SpinRite, he assumed that I had been reporting the offset – as I was. So he looked into SpinRite’s running, in-memory code and sure enough, he found a division instruction at that offset in SpinRite. He then proceeded to reverse engineer that region of SpinRite’s code to figure out what was going on and why. At one point I provided him with the relevant chunk of SpinRite’s source code so that he could be sufficiently confident that he knew what was going on. He had it exactly right. Drives had become larger, and the math that I had not revisited for decades, which decomposed a linear sector number into cylinders, heads and sectors for the BIOS was no longer able to handle today’s larger drives.
So Paul has produced another patch utility which fixes this problem for SpinRite 6.0. He created both a DOS driver that can be loaded through CONFIG.SYS and a DOS TSR that can be run before running the current SpinRite 6. After testing it thoroughly he provided me with his source code to review, and it was immaculate. So, thanks to his efforts, we have a patch for this bug that’s always been in SpinRite since its very early days. Out of curiosity, I checked SpinRite’s source code for 5.0 dated February 11th, 1996 and it’s the same code that SpinRite 6.0 is still using. So I never changed it for SpinRite 6.0 since, at the time, it was not a bug. But it is today. Hitting this error, which can only occur on drives over 549 gigabytes – and only when DynaStat engages – does not endanger or damage any of a user’s data, but it does mean that SpinRite cannot proceed with its recovery and repair. So my advice to all SpinRite users listening would be to grab Paul’s patch and to add it to SpinRite’s boot media. The file is called MDFYSR60.ZIP (that's a direct link to the file) and it’s in GRC’s freeware collection. It’s also now referred to near the top of the SpinRite FAQ and it has a menu entry under GRC’s main menu under SpinRite and “Knowledgebase: B04E”.
As I was writing this up for today’s podcast I suddenly became curious about what the code for this looks like now, since we already know that SpinRite 6.1 doesn’t contain this bug. We’ve all been running lots of DynaStat recoveries on lots of multi-terabyte drives without any trouble. So I was pleased to see that I had completely replaced that old code with a new routine which is capable of handling a full 64-bits worth of sectors. That’s 18,446,744 terabytes... so, even though releases of SpinRite tend to live for a long time, we should be good for a while.
Since this bug is already long gone from SpinRite 6.1, since 6.1 is so vastly superior to 6.0, is nearly finished, will be free to all 6.0 owners, and since this is never destructive to any SpinRite user’s data, I plan to get 6.1 finished, rather than delay it in an attempt to announce this patch to v6.0’s current owners. After I put all this online Sunday night, I updated Greg so it’s what he’ll be pointing anyone to who encounters this problem to the patch. And when I do announce 6.1, I’ll also inform all 6.0 users of this patch so that they’ll have it for 6.0, even though it will largely become obsolete, as will 6.0. So, a big public thanks to Paul Farrer for his terrific work on this.
One final point: Something else I had forgotten from 19 years ago which recently came to light was that for some weeks after SpinRite’s initial public availability I was still finding and fixing some final bits of debris. And I was updating SpinRite’s downloadable code on the fly. I didn’t have the mature version-stamping system that all of my recent work carries, and which SpinRite 6.1 will, so all of those early editions just say 6.0 without any indication of any sub version or build number. Nothing has changed in 19 years. But if you believe that you may have downloaded SpinRite 6 within week’s of its first release in 2004, and never again since then, you might want to update your copy until 6.1 is ready.