(I'm an OLD spinrite hand. Been away too long. My apologies! I sent this to support at grc way back in 2018... probably lost in the pile
)
SSD Health
I recently discovered that one of Steve's often stated assertions about how drives detect and correct errors is **incorrect.**
What he has often said:
- A level one scan to read an entire drive...
- Will help the drive discover any weak areas
- And when necessary, it will map out a bad spot and move the data to a good area
Recently experienced symptoms:
- Entirely good SSD
- Suddenly block zero went bad and things got worse from there
- All diagnostics said the drive is dying
- Unfortunately, it was mSATA in an MS Surface Pro
- After making a mirror image of the entire drive (ddrescue)...
- I attempted to rebuild the GPT and partition map, since that was the only real problem
- And I got a HUGE surprise: suddenly the drive had no bad spots and was perfect according to all diagnostics!
My research showed:
- SSD's can degrade in areas where they are only read and not written (in fact fascinating IBM research shows *nearby* areas can go bad! See below.)
- Current firmware does NOT necessarily detect nor fix-on-read
- HOWEVER, it immediately detects and fixes-on-write!
- SO, by attempting to overwrite block 0, the drive literally healed itself!
I am thinking that a cheap/free little utility could be written, perhaps part of SpinRite?
- Focusing on the obvious static parts of a drive (eg boot sector, GPT table, etc)
- Rewrites data every N months
- NOTE: The closest thing I've seen out there is a little free Windows utility: "Disk Fresh" nicely rewrites entire SSD's, and schedules itself every N months.
IMPORTANT evidence:
- A quite interesting (reasonably technical) research paper (note graphs pp 23-25) -- you can find this at
https://web.archive.org/web/20190622101205/http://smorgastor.drhetzler.com/library/
- Original link: http://smorgastor.drhetzler.com/wp-content/uploads/2014/08/SSD-Reliability.pdf
- (NOTE: Dr Steven Hetzler is no longer an IBM Fellow. He's now a senior guy over at Meta...)
Among other insights:
- Consumer SSD's are likely to see read bit-rot within a year
- Enterprise SSD's are worse: a few months (they are designed for many overwrites...)
- Thus, do NOT count on SSD's as long term storage!
- And so far, I've not found ANY manufacturers willing to discuss this. Nobody talks about mitigating this... even though it is seemingly a simple issue that COULD be addressed.
SSD Health
I recently discovered that one of Steve's often stated assertions about how drives detect and correct errors is **incorrect.**
What he has often said:
- A level one scan to read an entire drive...
- Will help the drive discover any weak areas
- And when necessary, it will map out a bad spot and move the data to a good area
Recently experienced symptoms:
- Entirely good SSD
- Suddenly block zero went bad and things got worse from there
- All diagnostics said the drive is dying
- Unfortunately, it was mSATA in an MS Surface Pro
- After making a mirror image of the entire drive (ddrescue)...
- I attempted to rebuild the GPT and partition map, since that was the only real problem
- And I got a HUGE surprise: suddenly the drive had no bad spots and was perfect according to all diagnostics!
My research showed:
- SSD's can degrade in areas where they are only read and not written (in fact fascinating IBM research shows *nearby* areas can go bad! See below.)
- Current firmware does NOT necessarily detect nor fix-on-read
- HOWEVER, it immediately detects and fixes-on-write!
- SO, by attempting to overwrite block 0, the drive literally healed itself!
I am thinking that a cheap/free little utility could be written, perhaps part of SpinRite?
- Focusing on the obvious static parts of a drive (eg boot sector, GPT table, etc)
- Rewrites data every N months
- NOTE: The closest thing I've seen out there is a little free Windows utility: "Disk Fresh" nicely rewrites entire SSD's, and schedules itself every N months.
IMPORTANT evidence:
- A quite interesting (reasonably technical) research paper (note graphs pp 23-25) -- you can find this at
https://web.archive.org/web/20190622101205/http://smorgastor.drhetzler.com/library/
- Original link: http://smorgastor.drhetzler.com/wp-content/uploads/2014/08/SSD-Reliability.pdf
- (NOTE: Dr Steven Hetzler is no longer an IBM Fellow. He's now a senior guy over at Meta...)
Among other insights:
- Consumer SSD's are likely to see read bit-rot within a year
- Enterprise SSD's are worse: a few months (they are designed for many overwrites...)
- Thus, do NOT count on SSD's as long term storage!
- And so far, I've not found ANY manufacturers willing to discuss this. Nobody talks about mitigating this... even though it is seemingly a simple issue that COULD be addressed.