SSD has taken itself offline

  • SpinRite v6.1 is Released!
    Guest:
    That's right. SpinRite v6.1 is finished and released. For the full story, please see this page in the "Pre-Release Announcements & Feedback" forum.
    /Steve.
  • Be sure to checkout “Tips & Tricks”
    Dear Guest Visitor → Once you register and log-in:

    This forum does not automatically send notices of new content. So if, for example, you would like to be notified by mail when Steve posts an update to his blog (or of any other specific activity anywhere else), you need to tell the system what to “Watch” for you. Please checkout the “Tips & Tricks” page for details about that... and other tips!

    /Steve.
  • Announcing “BootAble” – GRC's New Boot-Testing Freeware
    Please see the BootAble page at GRC for the whole story.
  • BootAble – FreeDOS boot testing freeware

    To obtain direct, low-level access to a system's mass storage drives, SpinRite runs under a GRC-customized version of FreeDOS which has been modified to add compatibility with all file systems. In order to run SpinRite it must first be possible to boot FreeDOS.

    GRC's “BootAble” freeware allows anyone to easily create BIOS-bootable media in order to workout and confirm the details of getting a machine to boot FreeDOS through a BIOS. Once the means of doing that has been determined, the media created by SpinRite can be booted and run in the same way.

    The participants here, who have taken the time to share their knowledge and experience, their successes and some frustrations with booting their computers into FreeDOS, have created a valuable knowledgebase which will benefit everyone who follows.

    You may click on the image to the right to obtain your own copy of BootAble. Then use the knowledge and experience documented here to boot your computer(s) into FreeDOS. And please do not hesitate to ask questions – nowhere else can better answers be found.

    (You may permanently close this reminder with the 'X' in the upper right.)


Salient Henry

New member
Nov 27, 2023
1
0
I have a 120GB ssd that I wanted to equalize the read speeds across the drive. I ran a level 3 and it failed with the "drive has taken itself offline" error. Is the drive bad? Should I replace it? The reading speed has also decreased after running the level 3 scan. After I power off the machine and unplug the drive I can restart spinrite but receive the error again at the same location on the drive.
 

Attachments

  • after level 3.jpg
    after level 3.jpg
    179.1 KB · Views: 61
  • before level 3 scan.jpg
    before level 3 scan.jpg
    290.1 KB · Views: 66
  • s drive offline error.jpg
    s drive offline error.jpg
    181 KB · Views: 62
Hello @Salient Henry. At this point I believe that this is a remaining loose end for SpinRite's RC5. Something did go bump in that drive. But I believe that SpinRite should be better at recovering from that event. Resolving this is at the top of my task list once I get to work on SpinRite's RC6. You should just note the percentage or sector number then restart SpinRite yourself from a bit earlier (to show that the issue was resolved). And it sure does appear that your dive will benefit! Please feel free to post your after-Level3 benchmark results if you choose! (y)
 
These older SSDs quite frequently degraded in speed and would quickly come back to life with a 10 second secure erase. (at the expense of any files on the drive that aren't backed up in advance).

As they degrade pretty quickly and are of little to no value, I'd replace it with something more reliable.
 
What you're saying, @lcoughey, fits the model of read fatigue: A 10-second "secure" erase would not have time to perform a full physical zeroing of the SSD NAND media, right? It would essentially be performing a mass TRIM to release all of the LBA-to-physical media mapping. In that case, any subsequent "reads" would be virtual, returning (probably) all 0's for never written and mapped LBA space, and any writes would succeed by re-writing and thus refreshing the NAND. As that point, such freshly re-written NAND would, indeed, read quickly and reliably since they would have been recently written. :)
 
GOOD!!!
Actually, my understanding is that a secure erase issues a command that results in the physical reset of all the blocks in parallel, as per the following.
I was going to add that simply wiping the FTL data would not be truly secure so I'm glad to learn that "Secure Erase" is really that. How certain are you of the timing? That's what caused me to wonder about how "secure" it could be given only 10 seconds. Perhaps Colby has some real-world feedback from his explorations?
 
I just had a chat with Roman, the SSD and flash guru, among other things, at ACE Labs and he has corrected me.

@Steve wins this point.

1701267052690.png
 
I appreciate the closure on this, @lcoughey. That makes the most sense given what we know of NAND writing. And it's going to be very useful in the future to recognize that the meaning of "Secure Erase" needs to be tempered with the caveat "secure from the outside", meaning secure from someone who is only able to access the memory through the FTL (what your NAND guru friend refers to as the Translator)... but not secure from someone who's willing to go to a lot more work. As he notes, that "internal" level of security will require much more time to achieve. Thanks, again.
 
Yes. Without the translator, the data on the NAND is essentially useless anyway. Bypassing with a direct read would only result in scrambled encrypted sectors. So, pretty much impossible to recover, but easily done in Hollywood.
 
Ah, yes... that's certainly a good point, too. Depending upon the mapping block size, the physical media would be a jumble of those fragmented blocks without any clear relationship to their original mapping.
 
I just had a chat with Roman, the SSD and flash guru, among other things, at ACE Labs and he has corrected me.

@Steve wins this point.

View attachment 930

Well, I think it's still not settled, no closure. In fact I think Roman may be wrong.

We can look at manufacturer provided documentation, for example: "Whether the sanitize operation is executed using SANITIZE BLOCK ERASE or the legacy SECURITY ERASE UNIT command, the drive-level operation is the same. Micron’s proprietary firmware instructs the SSD controller to send a BLOCK ERASE command to all NAND devices on the drive—including the NAND space reserved for overprovisioning and retired blocks, areas which are inaccessible by the host computer or the user.".

He (Roman) says IF secure erase goes for a couple of seconds .. So that's an IF. Erasing a block requires approximately 500µs (source). Let's assume a 500 GB SSD and 128KB erase blocks, that gives us 3906250 blocks. Full erase would take 32.5 minutes assuming blocks are erased sequentially, but when multiple blocks can be erased in parallel it may be a matter of minutes: "When the sanitize operation is initiated by the host computer, the SSD controller simultaneously erases the maximum number of NAND FLASH elements allowed under the SSD’s maximum-rated power consumption specification. Because of this parallelism, the SANITIZE BLOCK ERASE or the SECURITY ERASE UNIT command can be completed within one minute on the majority of Micron’s SSDs" (source).

Anyway, I personally think @lcoughey was closer to truth than Roman when he said "Actually, my understanding is that a secure erase issues a command that results in the physical reset of all the blocks in parallel". It may be brand specific but certainly does not seem to require "dark evil magic" unless Micron made a pact with the Devil.
 
Last edited:
Ah, yes... that's certainly a good point, too. Depending upon the mapping block size, the physical media would be a jumble of those fragmented blocks without any clear relationship to their original mapping.
But then still, that's pretty much what we have when doing chip-off from SD Cards and such. In itself it does not mean file system can not be reconstructed or at least raw data recovered. Of course this will be more complex with SSDs but not fundamentally impossible if it weren't for encryption or LDPC error correction. But the latter is probably only a temporary problem.
 
Last edited:
I agree, Joep. I suspect that only way to really assure users who want 100% assurance that an SSD is truly wiped will be to deliberately overwrite the entire accessible LBA data space -- acknowledging that this might not reach regions that have been mapped out due to trouble or wear leveling. Then follow that up with whatever best secure erasure API the device might have. Short of using undocumented manufacturer-specific commands, that would appear to provide the best possible assurance. And, of course, if someone did not want to invest that much time then they could settled for only using the fast secure erase option.
 
I agree, Joep. I suspect that only way to really assure users who want 100% assurance that an SSD is truly wiped will be to deliberately overwrite the entire accessible LBA data space -- acknowledging that this might not reach regions that have been mapped out due to trouble or wear leveling. Then follow that up with whatever best secure erasure API the device might have. Short of using undocumented manufacturer-specific commands, that would appear to provide the best possible assurance. And, of course, if someone did not want to invest that much time then they could settled for only using the fast secure erase option.
Yes, sure. But I personally wanted to understand what is happening inside the SSD. This wasn't about erasing data for the sake of erasing data, that's a different discussion. It isn't per se important either. Whether we 'wipe' blocks by means of erase (a) or simply reduce this to a translator level transaction (b), reads will always seem faster: We request LBA to be read, controller returns zeros if no physical NAND is mapped to that LBA. This will be true in both cases.

With regards to erasing data for the sake of erasing data it's pretty much what user feels comfortable with, in practice any secure erase will do IMO.

Anyway, we were talking about degraded performance and how to bump it. We should then consider read and write performance. As mentioned any type of action that 'resets' translator will improve read speed. Write speed is a different story. To bump write speed we need NAND we want to write to to be in erased state. The secure erase as proposed by @lcoughey will accomplish both. Only resetting translator ala TRIM in itself will not improve write speed, but only after we granted the garbage collector some time to actually start erasing blocks.

It's the autist in me that likes us to get our definitions straight. Too many times I see TRIM presented as 'erase'. Even Roman in above screenshot refers to TRIM as erasing stuff which IMO is wrong. TRIM is simply the process that tells the SSD about LBAs it can erase. It can now consider these LBAs 'garbage'. It's then the garbage collection process that will consolidate erase blocks and do the erasing, it does not care if these LBAs were result of TRIM or some other process: If we re-write data in LBA 100 (as in overwrite already present data) we know it's written somewhere else and so the NAND real estate assigned to LBA 100 before the overwrite is now garbage too. The garbage collector does not care how it became garbage.

IMO this loose usage of terms can trouble true understanding and discussion.
 
Last edited:
@Salient Henry : I was just writing elsewhere (answering a rhetorical question about the feasibility of running SpinRite backwards from the back to the front of a drive) and I recalled that another of the differences between the lower and the higher SpinRite levels is that Levels 1 and 2 deliberately use much shorter block transfers (1024 sectors) since they are "forward only" modes that do not always return to the front of the block for a write or re-read. But Levels 3 and above, which do continuously return to the beginning of each block, deliberately use SpinRite's much longer 32,768-sector (16MB) transfers because that allows SpinRite to proceed MUCH faster on "good" drives.

The reason Levels 1 & 2 use shorter blocks, is that during SpinRite's development we encountered damaged drives whose firmware appeared to not deal well with such large transfer requests when in the presence of any drive trouble. I believe that this was always with “spinners” and I don't recall this happening with SSDs, though that may have been because we didn't encounter any SSDs that had trouble.

What that bit of background... I'd LOVE to have you try starting SpinRite and adding xfer 128 to the command line. Run SpinRite at Level 3 and over the end of that drive where it's been dying... and let's see whether reducing the block transfer length allows SpinRite and the drive to past past that “sore spot.” (y)
 
I have a 120GB ssd that I wanted to equalize the read speeds across the drive. I ran a level 3 and it failed with the "drive has taken itself offline" error. Is the drive bad? Should I replace it? The reading speed has also decreased after running the level 3 scan. After I power off the machine and unplug the drive I can restart spinrite but receive the error again at the same location on the drive.
@Salient Henry : Back to the reason for your original posting...
I've just finished the work to make SpinRite much more "patient" with drives that appear to be taking more than 10 seconds to come back online. With the latest release (pre-release 5.01) SpinRite will now wait up to 60 seconds following a drive reset after an error before it gives up on a drive. And during that waiting it will display a countdown so that the user knows what's going on. I will be very interested in learning whether this works with that SSD you have. Thanks!

(More information is here: https://forums.grc.com/threads/pre-release-5-01.1415/
 

Similar threads