SSD Health: Bit Rot in read-only areas is Quite Real! A SpinRite opportunity?

  • Be sure to checkout “Tips & Tricks”
    Dear Guest Visitor → Once you register and log-in please checkout the “Tips & Tricks” page for some very handy tips!

    /Steve.
  • BootAble – FreeDOS boot testing freeware

    To obtain direct, low-level access to a system's mass storage drives, SpinRite runs under a GRC-customized version of FreeDOS which has been modified to add compatibility with all file systems. In order to run SpinRite it must first be possible to boot FreeDOS.

    GRC's “BootAble” freeware allows anyone to easily create BIOS-bootable media in order to workout and confirm the details of getting a machine to boot FreeDOS through a BIOS. Once the means of doing that has been determined, the media created by SpinRite can be booted and run in the same way.

    The participants here, who have taken the time to share their knowledge and experience, their successes and some frustrations with booting their computers into FreeDOS, have created a valuable knowledgebase which will benefit everyone who follows.

    You may click on the image to the right to obtain your own copy of BootAble. Then use the knowledge and experience documented here to boot your computer(s) into FreeDOS. And please do not hesitate to ask questions – nowhere else can better answers be found.

    (You may permanently close this reminder with the 'X' in the upper right.)

Papa Pete

Member
Feb 2, 2025
7
0
(I'm an OLD spinrite hand. Been away too long. My apologies! I sent this to support at grc way back in 2018... probably lost in the pile ;) )

SSD Health

I recently discovered that one of Steve's often stated assertions about how drives detect and correct errors is **incorrect.**

What he has often said:
- A level one scan to read an entire drive...
- Will help the drive discover any weak areas
- And when necessary, it will map out a bad spot and move the data to a good area

Recently experienced symptoms:
- Entirely good SSD
- Suddenly block zero went bad and things got worse from there
- All diagnostics said the drive is dying
- Unfortunately, it was mSATA in an MS Surface Pro
- After making a mirror image of the entire drive (ddrescue)...
- I attempted to rebuild the GPT and partition map, since that was the only real problem
- And I got a HUGE surprise: suddenly the drive had no bad spots and was perfect according to all diagnostics!

My research showed:
- SSD's can degrade in areas where they are only read and not written (in fact fascinating IBM research shows *nearby* areas can go bad! See below.)
- Current firmware does NOT necessarily detect nor fix-on-read
- HOWEVER, it immediately detects and fixes-on-write!
- SO, by attempting to overwrite block 0, the drive literally healed itself!

I am thinking that a cheap/free little utility could be written, perhaps part of SpinRite?
- Focusing on the obvious static parts of a drive (eg boot sector, GPT table, etc)
- Rewrites data every N months
- NOTE: The closest thing I've seen out there is a little free Windows utility: "Disk Fresh" nicely rewrites entire SSD's, and schedules itself every N months.

IMPORTANT evidence:
- A quite interesting (reasonably technical) research paper (note graphs pp 23-25) -- you can find this at
https://web.archive.org/web/20190622101205/http://smorgastor.drhetzler.com/library/
- Original link: http://smorgastor.drhetzler.com/wp-content/uploads/2014/08/SSD-Reliability.pdf
- (NOTE: Dr Steven Hetzler is no longer an IBM Fellow. He's now a senior guy over at Meta...)


Among other insights:
- Consumer SSD's are likely to see read bit-rot within a year
- Enterprise SSD's are worse: a few months (they are designed for many overwrites...)
- Thus, do NOT count on SSD's as long term storage!
- And so far, I've not found ANY manufacturers willing to discuss this. Nobody talks about mitigating this... even though it is seemingly a simple issue that COULD be addressed.
 

Papa Pete 2025-02-02 ...

- Consumer SSD's are likely to see read bit-rot within
a year
- Enterprise SSD's are worse: a few months (they are
designed for many overwrites...)
- Thus, do NOT count on SSD's as long term storage!
- And so far, I've not found ANY manufacturers willing
to discuss this. Nobody talks about mitigating
this... even though it is seemingly a simple issue
that COULD be addressed.

- - - - -

Thanks for your insight.

I reviewed your experience, and I interpret my experience
of similar SSD behavior this way:

- not 'bit rot' as much as sector access delays - the​
data is there, just not always easily or quickly​
readable,​
- 'consumer' versus 'enterprise' is too imprecise a​
demarcation,​
- SSDs do hold data forever, they just read it back​
slowly when they're 'tired',​
- one SSD manufacturer offers what appears to be a​
SpinRite-Level-5-like self-maintenance program, a​
clue that at least one manufacturer 'sees' a​
problem, and is trying to figure out a way to deal​
with it.​

- - - - -

I have found that each SSD is it's own game.

All are different, and most are reliable data storage-wise
but unreliable performance-wise.

That is, I have some SSDs that reliably read data, but
eventually as slowly as 5 megabits per second, not the
OOB Out of Box 500 MB/s promise that I saw on day one.

ALL SSDs I have seen appear to trudge on under SpinRite
6.1 LEVEL 5
without damage to the drive, performance,
or compromising data - they do not die, even though
they feel like they are going to, I see them going slowly
or even stopping for any length of time then resuming.

I have found that each SSD is a special case, so I have to
heuristically tweak SpinRite 6.1 settings, looking for the
settings that seem to best 'fit' the drive.

Some SSDs take SpinRite 6.1 LEVEL 5 with no
performance compromise end-to-end for the entire
SpinRite LEVEL 5 run.

Some SSDs feel sloggy under a direct SpinRite 6.1
LEVEL 5, slowing down on various sectors or sector
ranges - whatever the drive is apparently internally
mapping at the moment.

Some behave better with SpinRite 6.1 LEVEL 4 XFER 1024
or XFER 2048.

Perhaps I am hunting to find the best data transfer size to
fit inside an SSD's cache management, or the next largest
data transfer size to 'break' a drive's cache and require the
cache to be continuously interrupted and re-filled, or
something else?

Folks with chip-savvy might understand better the
possible churn going on inside any number of SSD
controllers and cache and whatever management chips,
plus the storage chips and supporting amplifiers, the
dynamics of interface, BIOS, and drivers.

Maybe it's a timing thing, and we need to go slower
during a full drive rewrite.

But, as I have explored, each SSD seems unique.

Maybe I'll try the SpinRite HASH command line option
next time to see if it slows SpinRite down enough to
allow the drive to do whatever it does internally between
reads and writes.

- - - - -

I have found that SSDs don't stay 'fixed',
performance-wise
.

Some SSDS feel like they are 'fixed' by re-writing, and
some of them then cascade back to minimum prior poor
performance rather quickly.

Here's a KingDian S280 240GB SATA3 SSD before and
after four differently-set SpinRite re-write passes, then
decaying after only 4 read-only passes:

1738592095788.png


- - - - -

And the SpinRite log doesn't tell the whole story of the
experience of watching the progress speed up, slow
down, stop, struggle, speed up, slow, down, stop, struggle.

I found a better correlation of my moment-to-moment
'feel' of drive responsiveness by looking at the data
transfer and sector access response curve graphs
produced by such programs as free HDD Scan, HD Tune,
and others.

Here's an HDD Scan of a WD Green 2TB NVMe, I note the
contrast between the responsiveness of the data transfer
versus the responsiveness of the sector access - as if it's
quick to get to each sector, but then slow to read that
sector's data:

1738591669767.png


I note that the 'average' performance on that well-used
NVMe SSD is about half of it's peak performance, where
peak tested performance probably appears in the graph
in as-yet unused areas.

Yet even the slowest performance of that NVMe SSD is
faster than a SATA3 HDD.

Hence our happiness with NVMe SSDs.

An SSD at it's worst may still be faster than an HDD at
it's best.


I compare this to HDDs that are slower than SSDs in both
sector access and data transfer, but HDDs appear to
never be as variable, sector-to-sector, but HDDs usually
are just half as responsive at the end compared to peak
performance at the front, a smooth curve slowing down:

1738592912286.png

- - - - -

So, on the topic, "... SSD Health: Bit Rot in read-only
areas is Quite Real! A SpinRite opportunity?
...", I'd
suggest defining our terms.

Let me Google 'bit rot' for me:

- "Bit rot" refers to the gradual corruption of digital​
data over time, where individual bits that make up a​
file change from their intended state (0 or 1), leading​
to data becoming partially or completely inaccessible​
or unusable; essentially, it's the slow deterioration of​
data stored on physical storage media like hard drives,​
often caused by aging components and​
environmental factors, also known as "data decay" or​
"data degradation.". Key points about bit rot:​
  • How it happens: Individual bits within a digital file
  • can flip from 0 to 1 or vice versa due to physical
  • degradation of the storage media over time.

  • Impact: Corrupted data can become unreadable or
  • lead to errors when accessed.

  • Factors contributing to bit rot:
    • Aging storage devices
    • Extreme temperatures
    • Environmental factors like humidity
    • Frequent read/write operations on a storage
      device
How to prevent bit rot:​
  • Regular backups:
    Regularly back up important data to different
    storage mediums to mitigate data loss if one
    storage device fails.
  • File integrity checks:
    Use checksum tools like MD5 or SHA-256 to verify
    if a file has been corrupted
  • Proper storage conditions:
    Store data in a cool, dry environment
  • Consider archival storage solutions:
    For long-term data preservation, use specialized
    storage media designed for longevity
- - - - -

OK, given that we each got all our data back, perhaps 'bit
rot' is not an appropriate term.

So:

What is going on when an SSD becomes poorly
responsive, but 100% recoverable nevertheless?
What 'tool' or process, if any, confirms and or repairs
whatever is going on with slow SSDs?

- - - - -

Regarding drive 'refresh', I suggest:
- cloning to another drive, as slowly as it takes for
compromised drives,​
- then using the new cloned target,
- and then preparing the original source drive for reuse by
SE secure erase.​

For permanently surface-mounted drive, we need an
alternative, maybe:
- cloning out,
- secure erasing,
- then cloning back?

- - - - -

Does that all make sense?

Is there more?

- - - - -

Thanks for the opportunity to explore this and share.

.
 

Attachments

  • 1738592900917.png
    1738592900917.png
    86.4 KB · Views: 117
Last edited:
I recently discovered that one of Steve's often stated assertions about how drives detect and correct errors is **incorrect.**

What he has often said:
- A level one scan to read an entire drive...
- Will help the drive discover any weak areas
- And when necessary, it will map out a bad spot and move the data to a good area
Could you provide a reference to this???

This is contrary to my understanding of Level 1.

My understanding is that Level 1 is read only. L1 will read all areas of the drive, mark any unreadable areas with a U on the GSD display, but will take no further action. That is, Level 1 does NO writing!

Level 2 (recovery ) will do it's best to read all areas of the drive. Hard to read areas can be re-written (refreshed) by Level 2.

Level 3 (maintenance) will read and re-write (refresh) ALL areas of the drive thus restoring the drive to optimum performance.

- I attempted to rebuild the GPT and partition map, since that was the only real problem
- And I got a HUGE surprise: suddenly the drive had no bad spots and was perfect according to all diagnostics!
SpinRite Level 3 would have done precisely that.

Unfortunately SpinRite 6.1 will not work on UEFI boot only Surface devices. :(

- SSD's can degrade in areas where they are only read and not written
Right. This is well known. It is called Read Fatigue. This is why periodically re-writing read only areas of an SSD is considered good normal procedure (SR Level 3).

- Current firmware does NOT necessarily detect nor fix-on-read
- HOWEVER, it immediately detects and fixes-on-write!
Agreed!

I am thinking that a cheap/free little utility could be written, perhaps part of SpinRite?
SpinRite does this. It is Level 3. However, it does require some knowledgeable manual effort by the user. When combined with a benchmark (e.g. GRC's ReadSpeed), SpinRite Level 3 can be confined to just those areas of the drive that need it while not subjecting other areas of the drive to unnecessary write activity.

do NOT count on SSD's as long term storage!
With proper care and maintenance SSD's will be fine for long term storage.

SSD's are subject to degradation from frequent reading (Read Fatigue) and as well as long term storage (Bit Rot). Judicious periodic use of Level 3 maintenance will effectively address both of these concerns.

Final Observation: My comments above apply to SSD's in general. More specifically however, they cannot be used with SpinRite 6.1 on a UEFI boot only Surface device as SR 6.1 is NOT compatible with UEFI booting. That would require SpinRite 7 Pro - perhaps 2-3 years away? :(

Thus, as the OP noted, other means would currently be needed for a Surface device.

A simple SpinRite utility, as the OP suggested, is not possible at this point as SpinRite does not yet have UEFI boot capability.
 
Sorry, I couldn't post at all if I included the links the first time. Let me try again... :)

LINKS: Here's an archived link to an informative presentation. Page 24 is quite interesting. WARNING: this is more than a little technical. Sadly, it is too big to attach. :(


There is a LOT of other info at the same archived site. Sadly no long live-online. I've got most if not all of it downloaded in case archive.org goes away.
  • @peterblaise it's NOT about "slow reads" -- his testing goes to an NRRE - Non Recoverable Read Error, using modified firmware to do direct Host Managed Interface (HMI) so he's in full control of the process and can see the errors developing.
  • In my case, yes I could recover because it was "only" the partition record that had completely failed. By rebuilding it, all was well.
OLD? ALREADY DISCUSSED? Perhaps... yet in searching, including the links you kindly provided Peter, I've found nothing approaching what I'm talking about.

This isn't just degraded read speed. It's actual bit destruction, ie real Bit Rot (yes, nice definition there ;) ).

SR LEVEL 3: Yep, sounds like an appropriate capability. My thought: while I used to do that (offline SR) on spinners whenever needed... I rarely if ever wanted to do "maintenance" rewrites. But now with SSD's, I absolutely 100% DO want that. Thus, this is a case where having a utility that works in the background on a live system would be Rather Nice. I have a number of drives in production that simply cannot afford to be offline regularly.
 
DanR "... contrary to my understanding of Level 1. My
understanding is that Level 1 is read only. L1 will read
all areas of the drive, mark any unreadable areas with
a U on the GSD display, but will take no further action.
That is, Level 1 does NO writing! ..."​

I'm guessing that when Papa Pete wrote "... A level one
scan to read an entire drive ... Will help the drive discover
any weak areas, And when necessary, it will map out a
bad spot and move the data to a good area
...", 'it" means
the drive itself will reallocate. Is that your understanding,
@Papa Pete? We thought this to be false, presuming that
an SSD drive will ONLY reallocate on a WRITE request,
not on a READ request, exactly as HDDs behave.

- - - - -

Regarding 'bit rot' on an SSD, I've have seen one and
only on sector that could not be read no matter what
I did
, requiring a SPINRITE LEVEL 3 or 4 or 5
DYNASTAT 0 ( no need to wait for DYNASTAT 5 minute
default ) - and that's after testing dozens of SSDs, new
and old
- so maybe I have seen one sector of SSD
'bit rot' after all, who knows? How would I know what
happened?

Update 2025-04-03 I have a DEAD SSD that takes
itself off line, and is no longer accessible beyond
occasional recognition, then denial of access to
data, an inexpensive Patriot P210 2TB SATA3 SSD
SN P210EDCB22120600073LU FW SN12429,
this
is a new behavior of SSD failure for me - my first.

Otherwise, all other SSDs I have tested read ALL data,
even if at only 5 MB/s, and work and survive under any
LEVEL of SPINRITE rewrite..

.
 
Last edited:
@Alice "... The great Peter Blaise ..."​

Thank you, blush.

- - - - -

Please do share your own experience testing SSDs with SpinRite.

Everyone, pitch in, and share experiences and links on the topic.

I'm sharing my experience, and I treasure the personal experiences, testimonies, and supporting evidence shared by others.

And, as Steve Gibson asked, duplicate and document.

@ColbyBouma generously has a repository of any SPINRITE LOG records we've shared.

Are there any LOG records that show SSD single sector full or partial recovery?


In my own testing, I've seen one unreadable SSD sector out of billions of sectors under SpinRite read/write testing.

I still have no recorded documentation of what went wrong, and what actually happened when it got fixed.

Yeah, it got fixed, and it has stayed fixed.

Does anyone have any clues, any tools for 'knowing' what happens when an SSD can't read a single sector at all, or can't read the entire contents of one sector, getting some of the sector's data bits, but not all data bits?

Sample SPINRITE LEVEL 5 NOREWRITE LOG:

Code:
  |--------------------------------------------------------------------------|
  | Sector 3,031,506,936 (77.5911%) Trouble was encountered writing          |
  | inverted data block.                                                     |
  |--------------------------------------------------------------------------|
  | Sector 3,031,506,936 (77.5911%) Retrying write to restore original       |
  | data.                                                                    |
  |--------------------------------------------------------------------------|
  | Sector 3,031,506,936 (77.5911%) Rewriting original sector                |
  | sector-by-sector.                                                        |
  |--------------------------------------------------------------------------|
  | Sector 3,031,506,936 (77.5911%) A media defect was located within this   |
  | sector.                                                                  |
  |--------------------------------------------------------------------------|

Resolved by not using NOREWRITE.

- - - - -

Regarding 'bit rot', nobody owns the term, and for me, it implies that data storage bits, as read back, do not match the data storage bits that were originally recorded.

https://en.wikipedia.org/wiki/Data_degradation +23 external original references
. . . +79,790 additional web search results.

These definitions of 'bit rot' are NOT my experience of the one sector that could not be read.

It's not that the sector was read successfully but had inaccurate content.

It's that the sector could not be read at all.

But the sector could be rewritten.

And rewriting the sector then made the drive able to write and read that LBA logical block address from then on.

I have no way of knowing if the successful write, and subsequent re-read, were to a reallocated location, not at the original 'sticky' sector.

Perhaps it was a bad map, a corruption where it was looking for sector 235827ýý64, or some other misstep in the go-between, for a random example.

The original data may have been sitting somewhere, but was unfindable in sector 2358277264, but there was no map to the original data anymore.

So the drive responded to SpinRite by saying "no can do".

I'm speculating.

So maybe there was 'bit rot', but in the map chip, not in the data storage chip.

And a forced write request from SpinRite was then responded to by the drive itself re-writing the map chip first.

How would anyone know?

Chip maker Crucial says publicly at https://www.crucial.com/support/articles-faq-ssd/why-does-ssd-seem-to-be-wearing-prematurely

. . . SSD [ Solid State Drive ] wear and performance are both dependent on the nature of the workload presented as IO [ input output ] activity from the host computer, on the amount of “static” data that’s stored on the computer (or the amount of free space), and on how long data has been stored. As these variables change, performance will change, and the pace of wear will change.
There are physical reasons for this. NAND [ "NOT AND", a type of flash memory, non-volatile, it can store data even when there is no power ] flash storage is organized in what SSD engineers call pages and blocks. A block of NAND flash can contain hundreds of pages, and a page contains 16kB of data, in most configurations. When a NAND block contains data, new data cannot simply be written over the present data. The block must first go through an erase step before it’s ready to receive new data. However, while NAND flash can be written a page at a time, it can only be erased a block at a time. All these complications mean that the SSD firmware is constantly managing the physical locations of stored data and rearranging data for the most efficient use of pages and blocks. This additional movement of stored data means that the amount of data physically written to the NAND flash is some multiple of the amount of data passed to the SSD from the host computer . . . [ that is, write requests always cause more write than requested ]

More from commercial SSD maker Samsung:

Flash Memory Summit 2014 Steven Hetzler, IBM:


. . . and so on.

- - - - -

Tools.

Who has tools that can look at a specific SSD in hand and report what's going on inside that SSD, chip-wise, size of data transfer, cache, blocks, pages, internal self-maintenance routines, what can be interrogated, what can get toggled on and off, what can be cleared, refreshed, whatever?

. . . +699,000 additional web search results.
Watch others dig into the chips:

. . . +2,370,000 additional web search results.

Eavesdrop or participate in other threads:

. . . +10,300,000 additional web search results.

- - - - -

If I have any goals here, I'd like:

to learn what is going on in my own systems, in the systems that I test, to measure, control, purchase, implement, and support accurately and appropriately, to eliminate surprises - I test ALL media with SpinRite Level 5 immediately upon purchase,​
to inform SpinRite 7+ development, in support of the goals above,
( and to know where and why SpinRite 6.1 and prior SpinRite versions are appropriate, or not, for SSDs ).​

On the one hand, folks just want to get to their data as reliably and quickly as possible.

On the other hand, end users are curious about what's going on under the hood, so to speak, and the inner workings of future SpinRite versions may depend on enhanced internal awareness of chip architectures and behaviors.

So, here I am.

Thanks.

.
 
@peterblaise "... an SSD drive will ONLY reallocate on a WRITE request, not on a READ request ..."​
@Alice "... SSDs reallocate all .. the .. time .. Only prerequisite being they can read / recover the data. SSDs function by reallocation, it's a core feature ..."​

Good point, thank you for the review and redirect.

Make that: "... In response to external read and write requests, such as from SpinRite, an SSD will ONLY reallocate on a WRITE request, not on a READ request, but, an SSD may reallocate on its own whenever the original manufacturer's programmer's whimsical internal calls decide ..."

Is that more accurate and appropriate?

In other words, on an SSD, an external read request that may experience internal errors does not in and of itself trigger a drive to perform internal LBA reallocation.

Or, on an SSD, does an external read request possibly initiate a reallocation if the drive itself 'thinks' that reallocation is appropriate in response to the read attempt?

Thanks.

.
 
Last edited:
Apologies for the delay. A lot of Real World at my end (looks like heading into back surgery :( )

@peterblaise "... an SSD drive will ONLY reallocate on a WRITE request, not on a READ request ..."​
@Alice "... SSDs reallocate all .. the .. time .. Only prerequisite being they can read / recover the data. SSDs function by reallocation, it's a core feature ..."​

...
Make that: "... In response to external read and write requests, such as from SpinRite, an SSD will ONLY reallocate on a WRITE request, not on a READ request, but, an SSD may reallocate on its own whenever the original manufacturer's programmer's whimsical internal calls decide ..."

Is that more accurate and appropriate?
In other words, on an SSD, an external read request that may experience internal errors does not in and of itself trigger a drive to perform internal LBA reallocation.
Or, on an SSD, does an external read request possibly initiate a reallocation if the drive itself 'thinks' that reallocation is appropriate in response to the read attempt?
Permit me to provide a practical real world example that breaks the red-highlighted assertion by @peterblaise

Samsung Releases Second 840 EVO Performance Fix
...a firmware that periodically refreshes old data
In other words, in response to repeated read requests, if the drive detects sufficient internal (ECC) errors, it will refresh the data block.

Going backwards to Peter's extended discussion, here's an interesting statement:
It's not that the sector was read successfully but had inaccurate content.
It's that the sector could not be read at all.
But the sector could be rewritten.
And rewriting the sector then made the drive able to write and read that LBA logical block address from then on.
I have no way of knowing if the successful write, and subsequent re-read, were to a reallocated location, not at the original 'sticky' sector.
We've got to be very careful with our mental models of what happens when we interact with a storage device.

A traditional spinner drive, going back to my first one (the ST-506 -- all five megabytes of 5 1/4 inch full height storage LOL) works exactly as @peterblaise describes.
However, flash storage isn't like that at all.
  • As Peter noted above, pretty much every time a sector/block is rewritten, it is "reallocated." We can be quite confident about this. It's very inefficient to wait for the old block to be cleared for a rewrite!
Also
  • While a hard disk read head involves passing a magnetic sensor through the magnetic field flux changes stored on the surface of the disk*** -- and absolutely NO voltage or charge is applied to the disk. Other than physical damage, there's nothing that ought to cause any kind of errors to accumulate due to continued reading of a hard disk.
  • A flash drive stores data via a bunch of electrons in a particular place.
    • Some electrons leak out over time, no matter what. That's one source of degradation.
    • Data is read by applying a voltage to a read-line... a voltage much lower than needed to write to the flash, yet by definition that applied voltage causes some electron leakage on every read performed.
    • It's expected that some of the bits will degrade before a data block is refreshed. Thus, a bunch of ECC error correction is always needed.
    • As long as there are sufficient ECC bits available, the sector/block can be correctly read without any retries.
    • The internal flash controller can tell how many bit errors were involved in the latest read, ie how close the read was to failing...
    • ...so ideally the block can be auto-refreshed before the outside world even knows there's an issue
  • All of that is described in terms of how it ought to be.
  • Please think through the above. It will change how the reader conceptualizes flash data storage, retrieval, and error correction challenges.
Unfortunately, vendors don't like to talk about any of this. It makes flash memory seem less than reliable LOL. In fact, none of the vendor links provided by Peter above discuss this!

Here is another paper that goes into quite a bit of detail.
Note the focus on Raw Bit Error Rate (RBER): https://users.ece.cmu.edu/~omutlu/pub/flash-memory-data-retention_hpca15.pdf

*** For TMI on hard drives, search google for "RLL.TXT" -- a post I wrote a verrrry long time ago explaining how data was originally stored on HDD's. More modern storage is more like the Hitchhiker's Guide To The Galaxy's "Improbability Drive" :) ... literally, PRML -- Partial Response, Maximum Likelihood... instead of storing the actual data, a probability function is calculated, and the parameters are stored on the disk. Sounds crazy but it works.
 
For those who want to dig into the details of the most recent research on these subjects, here's what look like some pretty interesting current papers. My library is a member of the Inter-Library Loan (ILL) system. I'll be ordering up PDF's of each of these myself. Pretty nice -- no charge! :)

(2023) Read Disturb and Reliability: The Complete Story for 3D CT NAND Flash
(2024) LaVA: An Effective Layer Variation Aware Bad Block Management for 3D CT NAND Flash
(2025) High-Precision Error Bit Prediction for 3D QLC NAND Flash Memory: Observations, Analysis, and Modeling
 
@peterblaise "... an external read request that may
experience internal errors does not in and of itself
trigger a drive to perform internal LBA reallocation ..."​

@Papa Pete "... a practical real world example that
breaks the ... assertion by @peterblaise ... Samsung
Releases Second 840 EVO Performance Fix ... a
firmware that periodically refreshes old data In other
words, in response to repeated read requests, if the
drive detects sufficient internal (ECC) errors, it will
refresh the data block ..."​

Good for them.

I've just obtained two Phison-based SSDs - NVMe and
SATA - they probably claim similar behavior; we'll see
over time how their data access and transfer
responsiveness bears out.

And that reinforces our precision, identifying that a
failed or ecc-corrected external read request resulting
in failure to read or ecc-correction in and of itself is
not the source of an internal LBA reallocation - the
drive's own algorithm is responsible for the rewrite.

Good for them.

- - - - -

@Papa Pete "... As @peterblaise noted above, pretty
much every time a[n SSD ] sector/block is rewritten, it
is "reallocated" ..."​

No, I was not aware of that or presuming that.

I actually thought a rewrite request got at least an entire
cluster rewritten IN PLACE, and considering SSDs having
'pages' or 'blocks of pages' or whatnot, entire 'pages' or
'blocks of pages' being rewritten IN PLACE.

"... reallocating ...", not so much.

Especially when SpinRite 6.1 Level 5 is marching through
32,768 LBA logical block addresses in each fell swoop -
that's 32,768 x 512 bytes = 16,777,216 bytes, or 16,384 KB
16 MB.

How big is an SSD 'page' or 'block of pages'?

4 KB per block?

128 to 256 pages per block?

512 KB to 1,024 KB per block of pages?

So a 16 MB rewrite request, sure, why can that not be
rewritten IN PLACE?

I dunno.

How would anyone know?

It's not like a SpinRite 6.1 Level 3, 4, or 5 on an SSD is
quick!

Something must be happening during all that rewriting
and waiting.

- - - - -

@Papa Pete "... Data is read by applying a voltage to a
read-line ... a voltage much lower than needed to write
to the flash, yet by definition that applied voltage
causes some electron leakage on every read performed.
It's expected that some of the bits will degrade before a
data block is refreshed ..."​

I think what you are saying is that SSDs lose peak
resolvability of data when in use and when not in use
.

@Papa Pete "... a bunch of ECC error correction is
always needed. As long as there are sufficient ECC bits
available, the sector/block can be correctly read
without any retries. The internal flash controller can
tell how many bit errors were involved in the latest
read, ie how close the read was to failing ... so ideally
the block can be auto-refreshed before the outside
world even knows there's an issue All of that is
described in terms of how it ought to be ..."​

That's why SSD slow down - they are spending
inordinate time reconstructing missing data.


Once successfully read, a rewrite will start the cascade all
over again from the top, from peak readability, as the
SSD then degrades once more.

An endless cycle.

Hence, folks hunting for a schedule of when to run
SpinRite 6.1 Level 3 rewrite - every two years, every year,
every half year, every month ...

Considering that I see delay within a few reads after a
full rewrite - as illustrated in
https://forums.grc.com/threads/ssd-...e-real-a-spinrite-opportunity.2003/post-14601

- I'd say SSDs are a swapping game, new for old.

SSDs are probably best cloned to new drives as soon as
they feel slow
, then recycle the old one for electronic
parts.

- - - - -

Good discussion.

Excellent references.

More to read for a deeper understanding of other
people's experience, especially in well-measured
environments.

Thanks.
 
@Papa Pete "... a practical real world example that
breaks the ... assertion by @peterblaise ... Samsung
Releases Second 840 EVO Performance Fix ... a
firmware that periodically refreshes old data In other
words, in response to repeated read requests, if the
drive detects sufficient internal (ECC) errors, it will
refresh the data block ..."​
...
And that reinforces our precision, identifying that a
failed or ecc-corrected external read request resulting
in failure to read or ecc-correction in and of itself is
not the source of an internal LBA reallocation - the
drive's own algorithm is responsible for the rewrite.
Not sure what you are saying here.
I specifically said that with this firmware configuration in the drive, an external read request that even comes close to failing, IS directly the cause of a rewrite.

Sure, that's an "algorithm" in the drive firmware. How could it be otherwise? It's an algorithm when an external rewrite or trim or whatever request to cause a particular physical action to take place :)

100% of the time, any external request is processed by drive firmware. It's been that way ever since we placed microcontrollers in the drive electronics. (Read my old RLL.TXT paper to go back to the days when the drive "controller" was not part of the drive! We also used #2 pencils with erasers to bump the disk and get it running after stiction caused it to fail to run up on power on. Pretty unbelievable today LOL.)

@Papa Pete "... As @peterblaise noted above, pretty much every time a[n SSD ] sector/block is rewritten, it is "reallocated" ..."

No, I was not aware of that or presuming that.

I actually thought a rewrite request got at least an entire cluster rewritten IN PLACE, and considering SSDs having 'pages' or 'blocks of pages' or whatnot, entire 'pages' or 'blocks of pages' being rewritten IN PLACE.

"... reallocating ...", not so much.
Then it's high time to update your understanding.
  • There's literally no reason at all to store revised data in the same place. That would just slow down the write process, because...
  • Every write to an SSD must be to a block that has first been fully erased!
  • That's one reason TRIM is soooo important. It tells the controller which blocks can be erased during idle time.
  • Without TRIM, any given write requires an erase plus write cycle!
And the whole point of SSD "wear leveling" is that writes are spread across all available empty blocks.

How big are blocks? The simple answer: all depends, and apparently quite proprietary these days. I've seen well over 32MB on modern storage sticks. To do that requires plenty of cache. And to do THAT requires good power management, so when the lights go out, the cache can be quickly written to permanent flash storage ;)

Especially when SpinRite 6.1 Level 5 is marching...

So a 16 MB rewrite request, sure, why can that not be rewritten IN PLACE?
Because that would be a lot slower than writing it elsewhere ;)

Something must be happening during all that rewriting and waiting.
Once the cache is full... we're stuck with a type of wait loop: blocks are being erased as quickly as possible, then written to, making room in the cache for more requests.
I think what you are saying is that SSDs lose peak
resolvability of data when in use and when not in use
.
I'm not sure where "resolvability of data" comes from. Simply put, flash storage consists of a (leaky) pile of electrons. Various things cause the leakage, including time, read operations, etc. And some things actually improve the leakage! A surprise from the new papers I referenced.
That's why SSD slow down - they are spending inordinate time reconstructing missing data.
Not necessarily. No reconstruction is needed if only a few errors. That's the whole idea: essentially instant correction of a small number of bit errors, and reliable detection of more errors. Of course, one doesn't ever want to get to the point where you're missing data.

Unfortunately, we do tend to go there, because we've not understood the failure modes of flash memory.
I'm reasonably familiar with this stuff, and the 2025 paper I linked and have now skimmed, has taught me a lot... and it introduced more questions than answers! I'll try to write up a summary ASAP.

My apologies... my back is not happy and I need to go rest.

Once successfully read, a rewrite will start the cascade all over again from the top, from peak readability, as the SSD then degrades once more.
An endless cycle.
In a sense, that's 100% true. SSD data is never permanent.

However, rewriting the data will absolutely refresh it to full capacity... or whatever the capacity is now. Think of it in terms of a bunch of rechargeable batteries. Over time, the charge capacity decreasees until it isn't worth using anymore.

For consumer SSD's I recommend a rewrite 3x a year to be safe.

Good discussion.

Excellent references.

More to read for a deeper understanding of other people's experience, especially in well-measured environments.

Thanks.
No prob! More to come :)

And BTW, "well-measured environments" is painfully challenging with this stuff. All of these papers crack open the SSD's and gain access to the underlying technology... a bit like the tools I helped create back in the day for HDD's. Our systems peeked directly at heads, electronics and more, bypassing the "normal" interfaces.
 
Thank you for hanging in there, exploring, and sharing.

As far as I can tell, we think as one, not that I am adding
much thinking.

We're probably in a semantic spin regarding external read
requests causing a rewrite - if some reads do, and some
read's don't, then the read doesn't cause the rewrite,
something else does, and you suggest, an internal
algorithm, threshold, firmware savvy, but not anything
the outside world has any control over to cause
intentionally on purpose just by reading.

I'm an end user, so my speculations are experience-based.

When an SSD is slow to access and retrieve data, I have
no idea why, and apparently, others have only theories.

Theories based on intimate knowledge of SSD design,
but theories and speculation nevertheless.

Hence my quandry.

I speculate as an end user based on my experience of
SSD behavior as observed from an end-user's point of
view.

Others speculate on SSD behavior based on theories and
generalized chip design.

We're both speculating.

And we're both experiencing SSDs failing to deliver as
promised.

And that is my point, no matter how we got here:

SSDs fail to deliver as promised.

- - - - -

@Papa Pete "... For consumer SSD's I recommend a
rewrite 3x a year to be safe
..."​

Oh?

By doing what?

Clone, then secure erase, then re-clone back?

Even cloning may take -f-o-r-e-v-e-r-, because we're
dealing with an SSD that has slowed -w-a-y- down, it
tasks a -l-o-n-g- time to read slow sectors:

@peterblaise "... Once successfully read, a rewrite will
start the cascade all over again from the top, from
peak readability, as the SSD then degrades once
more. An endless cycle
..."​

@Papa Pete "... In a sense, that's 100% true. SSD data
is never permanent
..."​

Or, beyond "cloning and secure erase and cloning back",
there's re-writing in place via SpinRite 6.1 Level 3, 4, or 5,
or HDD Regenerator and it's options, or HD Sentinel and
it's options.

They all may have to go through horribly painful delays
reading an SSD's data.

Then we may have to wait for the SSD to figure out how
to rewrite everything.

And, rewriting sometimes -c-r-a-w-l-s- for whatever
reason.

We can just skip the rewrite by cloning to a new drive,
and toss the old drive, never rewrite it.

SSDs are themselves consumables.​
HDDs are for data proprietors.​

- - - - -

But, hey, moving the conversation forward:

@Papa Pete "... For consumer SSD's I recommend a
rewrite 3x a year to be safe
..."​

Oh?

By doing what?

What's your suggestion for rewriting SSDs?

And what would be the goal?

Original performance as promised on the vendor's​
technical specifications?​
Or ...
One-half original promise performance, as realized in​
actual use ( that's my experience )?​

Here's a sample HDD Scan of an NVMe, notice three
levels of real-life performance - and this is read graph,
not a write graph, which is even slower:

2,800 MB/s as promised ( in scantly unused areas )​
1,400 MB/s best as delivered in real life​
200 MB/s 10% or less performance as it dies​

1743896108186.png

NVMe SSD WD Green SN350 2TB FW 6000
SN E823_8FA6_BF53_0001 2024-08-25

Access delays in the chart at the right:

2 sectors take longer than 50 ms to resolve,​
56 sectors take longer than 20 ms to resolve,​
2,170 sectors take longer than 10 ms to resolve​

Note that claimed performance is

Sequential Read Speed: Up to 3,200 MB/s​

Sequential Write Speed: Up to 3,000 MB/s​

"up to" covers everything from dreadfully slow to peak,
I guess - it lives up to it's 'promise' somewhere across
the drive.

- - - - -

Three times a year forever?

Or three times a year until we can't stand the slowness
whenever it manifests itself, whether in one year or
even sooner?

What's your recommendation for:

the method of rewriting,​
measuring performance,​

and the criteria calling for the next rewrite?​

Thanks.
 
No, I have not 'tested' ADATA SU800 SSDs - why do you
ask? If you have one or more, can you test and tell us
how they preform?
 
Thank you for hanging in there, exploring, and sharing.
No prob! Briefly back... got a shot in my spine Friday... and now prepping for offsite travel. Three significant trips coming up although I may have time while on the road to get more fun stuff done ;)

We're probably in a semantic spin regarding external read
requests causing a rewrite - if some reads do, and some
read's don't, then the read doesn't cause the rewrite,
something else does, and you suggest, an internal
algorithm, threshold, firmware savvy, but not anything
the outside world has any control over to cause
intentionally on purpose just by reading.

I *think* I'm beginning to understand what you're trying to communicate here.

I think you're correct that that "outside world" can't force a rewrite solely by doing reads... at least not in a predictable (documented!) way. ;)

Given that, it's honestly not as bleak as you've described...
When an SSD is slow to access and retrieve data, I have
no idea why, and apparently, others have only theories.
In a sense, YES. and completely understandable: the firmware built into today's drives (of all kinds) is highly proprietary. I don't expect any vendor to reveal exactly how/why they do what they do!

We could create a list of the factors that can cause such slowdowns. But that's not satisfying enough, because in any given case we have no idea which of the possible factors come into play for instance #N.
And that is my point, no matter how we got here:

SSDs fail to deliver as promised.
What promise do you feel is going unmet? I am guessing the performance promise...

- - - - -

@Papa Pete "... For consumer SSD's I recommend a
rewrite 3x a year to be safe ..."​

Oh?

By doing what?

Clone, then secure erase, then re-clone back?
Naaah. Much simpler. The reason for 3x a year is to generally avoid the slowdown in the first place. To the extent we can refresh charge (bits) in all stored data, without having to do a ton of multi-read attempts, we keep the data fresh, and in the realm of high-performance reading.

Or, beyond "cloning and secure erase and cloning back",
there's re-writing in place via SpinRite 6.1 Level 3, 4, or 5,
or HDD Regenerator and it's options, or HD Sentinel and
it's options.
EXACTLY. If I understand SpinRite levels correctly, the following would all be equivalent, and all should be quite fast.

The goal: to perform a full rewrite of all used sectors in a single pass. Not trying to be smart about it at all.
* SpinRite 6.1 Level 3
* HD Sentinel: Read-Write-Read surface scan
* DiskFresh on the whole drive (Windows only AFAIK. *free*)

(I don't see value in HDD Regenerator used for this purpose on flash media.)

NOTES:
* Used for this purpose, it's good to maximize the amount of data processed per data transfer command. We simply need to get through the whole drive quickly. NOT expecting any bad data.
* Note that (as discussed above and elsewhere), flash drive caching has a huge impact on performance when rewriting much of the drive. Super-quick at first, but eventually you're slowed down by size-of-cache and other factors... at least for most drives.

Then we may have to wait for the SSD to figure out how
to rewrite everything.
They are pretty good at this. The algorithms are NOT all that complicated!
And, rewriting sometimes -c-r-a-w-l-s- for whatever
reason.
As long as we write a large enough block, this should never be an issue. Partial block rewrite CAN get painful, because the rest of the block must be preserved. This is where having some understanding of underlying hardware can help. AFAIK mfg's typically do provide this kind of info. I admit, haven't checked many spec sheets recently. ;)

We can just skip the rewrite by cloning to a new drive,
and toss the old drive, never rewrite it.
That would be quite sad!

Consider my friend who has a high end AI supercomputer. Started out with a petabyte of high speed flash storage (16TB at a time... that's just 64 storage modules :-D ) ... I see one such module at B&H now for $3500. So (ignoring RAID etc) that's under $250k for 1 PB of flash storage. No bad :-D
And what would be the goal?

Original performance as promised on the vendor's​
technical specifications?​
That's generally MY goal.

HOWEVER: before considering that as a goal, my prior goal is to understand the performance available from the storage system in an actual working machine.

Same thing as with network throughput, etc. By analogy:
* Can I use a network tool to beat on an Ethernet link and get full performance? If not, I have some work to do. Bad cables? Connectors? Settings? Interfaces? (This was how I proved that some cheap NICs were using fake Intel network chips... ;) )
* Once that is solid, what is file I/O performance like across the network? That's an entirely different set of parameters
* Etc...

Same with storage.

Because it's ALWAYS been true: there's almost always a bottleneck to system/storage performance. Very few systems are designed so well as to provide theoretical max speed everywhere, everywhen. :)
Three times a year forever?
Yep. Just preschedule a job to run. Diskfresh has this as a built in option, BTW. I consider it good maintenance. Keep those bits happy :)

What's your recommendation for:

the method of rewriting,​
measuring performance,​

and the criteria calling for the next rewrite?​
Rewrite on a schedule. Don't wait for slowness or failure.
Use whatever tools you like to check performance.
Rewrite using any tool that actually rewrites.

KISS.