Spinrite RC5 / Zimaboard / NVMe

  • Be sure to checkout “Tips & Tricks”
    Dear Guest Visitor → Once you register and log-in please checkout the “Tips & Tricks” page for some very handy tips!

    /Steve.
  • BootAble – FreeDOS boot testing freeware

    To obtain direct, low-level access to a system's mass storage drives, SpinRite runs under a GRC-customized version of FreeDOS which has been modified to add compatibility with all file systems. In order to run SpinRite it must first be possible to boot FreeDOS.

    GRC's “BootAble” freeware allows anyone to easily create BIOS-bootable media in order to workout and confirm the details of getting a machine to boot FreeDOS through a BIOS. Once the means of doing that has been determined, the media created by SpinRite can be booted and run in the same way.

    The participants here, who have taken the time to share their knowledge and experience, their successes and some frustrations with booting their computers into FreeDOS, have created a valuable knowledgebase which will benefit everyone who follows.

    You may click on the image to the right to obtain your own copy of BootAble. Then use the knowledge and experience documented here to boot your computer(s) into FreeDOS. And please do not hesitate to ask questions – nowhere else can better answers be found.

    (You may permanently close this reminder with the 'X' in the upper right.)

slim724

Member
Dec 8, 2023
7
0
Minneapolis
Team,

I bought a PCie card for my 1TB NVMe drive to run on a ZimaBoard. Running Spinrite RC5 and FreeDOS. When it gets to 23.1037%, it hangs with "After an error occurred, this drive was reset....". I've tried level 1, 2 and 4....all the same.

I know there are issues with this drive. That's why it's been replaced. I the RC5 would be able to handle it. Is this to be expected? I tried searching these forums for a similar match.

Advice?

J
 
@Steve is working on a change to potentially address a specific issue related in this area for RC6. As I mentioned in another recent post, however, the drive firmware is in control here. If the drive firmware becomes unhappy, it can choose a number of reactions, including simply locking up, timing out, or asserting a "fatal error" condition wherein there is nothing the software (OS/SpinRite) can do to coax it back online (a power cycle will be required.) You can check for manufacturer utilities, or perhaps something S.M.A.R.T. utility wise to hopefully learn more... but it's probably not guaranteed to be very helpful... Drive manufacturers don't seem to try to be heroic when it comes to recovering from "fatal" drive issues.
 
@Steve is working on a change to potentially address a specific issue related in this area for RC6. As I mentioned in another recent post, however, the drive firmware is in control here. If the drive firmware becomes unhappy, it can choose a number of reactions, including simply locking up, timing out, or asserting a "fatal error" condition wherein there is nothing the software (OS/SpinRite) can do to coax it back online (a power cycle will be required.) You can check for manufacturer utilities, or perhaps something S.M.A.R.T. utility wise to hopefully learn more... but it's probably not guaranteed to be very helpful... Drive manufacturers don't seem to try to be heroic when it comes to recovering from "fatal" drive issues.
Thanks for the super prompt reply.

I'll wait for RC6. Not interested in manufacturer utilities. I want Spinrite to succeed in identifying and marking the bad areas. The drive isn't toast. Last time it was in use, I was able to boot up and use it. I didn't even know there were issues with it until I tried to clone it. CHKDSK wasn't much help except to validate Macrium. Hoping Spinrite is the solution - I want to keep that in my toolbox.

J
 
And if we consider that then trying to get the sector reallocated may not be the best approach: we'd read sector it would fail and hang the firmware. We could power-cycle and get a good read and so IOW no reason for firmware to initiate reallocation. Plus there's the risk the drive will stop responding at all at some point, I have had this happen after for example having imaged 80% of the drive using above method.
But, especially with NVMe drives, if the first read has failed, do we KNOW that the second read was actually trying the same physical page, or could the drive have silently worked out what was there and re-allocated the data to a different physical sector but using the same logical address?
 
I bought a PCie card for my 1TB NVMe drive to run on a ZimaBoard. Running Spinrite RC5 and FreeDOS. When it gets to 23.1037%, it hangs with "After an error occurred, this drive was reset....". I've tried level 1, 2 and 4....all the same.

I know there are issues with this drive. That's why it's been replaced. I the RC5 would be able to handle it. Is this to be expected? I tried searching these forums for a similar match.
Expanding a bit on what Paul wrote earlier...

After fully resetting a drive following an error, SpinRite has been waiting for up to 10 seconds for the drive's status to report that it is again ready to continue. It turns out that for some drives, 10 seconds is not sufficient. In experiments over this past weekend I've verified that if drives are giving more time they often will come back online. So, starting with RC6 (release candidate #6), SpinRite will give drives up to a full 60 seconds to "get their act together" and come back online. Since that is a LONG TIME for someone to wait for SpinRite while nothing appears to be going on, SpinRite RC6 and later posts an on-screen “Waiting for drive: xx” count down timer while it's waiting, so that the user knows what's happening and that the system hasn't died.

I'll be VERY INTERESTED to see if this actually does help with your NVMe drive. I can see a mechanical drive needing some time to do whatever it might need to do following a full reset. But I'd expect a solid-state drive to get back online sooner. (And it'll also be interesting to know, as that count down proceeds, at which point in time the counter disappears and work resumes.)

I'll wait for RC6. Not interested in manufacturer utilities. I want Spinrite to succeed in identifying and marking the bad areas. The drive isn't toast. Last time it was in use, I was able to boot up and use it. I didn't even know there were issues with it until I tried to clone it. CHKDSK wasn't much help except to validate Macrium. Hoping Spinrite is the solution - I want to keep that in my toolbox.
I'm gratified to see that you feel this way (since I do too!) If RC6 does not resolve this problem I'll want to work with you to figure out whether there's anything that SpinRite can do to resolve it!

I have this written and working now, but not as thoroughly tested as I'd like. So once this week's podcast is behind me I'll verify that it's all doing what I expect and we'll all move to RC6. (y)
 
@slim724:

The latest pre-release of SpinRite (5.02) incorporates this new 60-second wait with a countdown and it fixes the mistake I made in doing that for pre-release 5.01. So, when you can, let's see how release 5.02 functions with that ZimaBoard-mounted NVMe memory? You can find the details for grabbing the latest here: https://forums.grc.com/threads/pre-release-5-02.1417/

Thanks!!
Thanks. I'll get started.

J
 
Terrific! I'm pretty sure you could start at just before the trouble you've seen since the trouble is almost certainly about a specific "sore spot" of the media. (y)
 
No luck. See screen shot.

error-zoomed-jpg.976
 

Attachments

  • error.zoomed.jpg
    error.zoomed.jpg
    257 KB · Views: 483
No luck. See screen shot.
Okay. So the question is... Before that happened did a ~60 second timer countdown appear in the upper left of the screen? And did it count down to zero before that message appeared?

It IS still entirely possible that the drive really has just gone offline. The difference between the original RC5 and the later incremental 5.02 is that it will give the drive a full 60 seconds to get back online rather than just 10 seconds.

And, as I mentioned previously, I was always skeptical that a solid-state drive might require more then 10 seconds. The instance where this behavior was observed was on spinning ("spinners") mechanical drives.

Here's one question: If you hit this screen, then you exit SpinRite WITHOUT powering down, and then restart SpinRite, is the drive again ready to go? Or does it remain "offline" until the power is cycled?

Thanks!!!
 
WHOA!! I just noticed that this was a BIOS connected drive. When you said that you had obtained a PCIe card for an NVMe drive, I (wrongly) assumed that it was somehow emulating a SATA drive. But apparently that PCIe card has brought along its own BIOS.

THAT means that I have a bit more work to do. I've been working to get a 5.03 pre-release out. I'll get that posted and let you know

Thanks! (There's still reason to believe I can fix this! (y) )
 
Hi @Steve,

OMG, you're right. Since I didn't get the pesky 137GB message, I thought I was use SATA.

No, no delay before the red screen appears.

Here is what I see when the drive scan completes. It just confirms what you're thinking.

John
2023-12-27_18-23-57.jpg
 
John (@slim724):

I just checked the SRPR-503 release source code. It does have the updated BIOS reset recovery code. So if anything's going to be able to work on that drive, the currently published release should. There's a limit to what can be done through the BIOS... but SpinRite will do what it can. And SpinRite 7 will be able to access that NVMe drive directly at the hardware level. (y)
 
@Steve,

Testing against 5.03 was different but produced the same end result. Different in that I now see the countdown timer in the upper left corner. I tried both level 1 and 3. Same.

Perhaps I should purchase a proper SATA -> NVMe enclosure? It would be interesting to see if it succeeds when BIOS is no longer a factor.

Thoughts?

John
 
It's true that if you're able to put the NVMe drive into a SATA enclosure, so that you can then plug it into a SATA port, then SpinRite will almost certainly have a much more "intimate" interaction with the drive. For example, you should then see the drive's make/model and serial number (none of which are available through the "insulation" created by the BIOS.) Also, SpinRite will then be able to run at its maximum speed on that drive, using 1024-sector transfers for level 1&2 and 32768-sector transfers for levels 3-5. The BIOS imposes a strict limit of 127 sectors for everything.

I'm glad to know that BIOS access is now properly being very patient and giving the drive ample time to come back online. And this issue — of SpinRite giving up on highly troubled drives when there may be some way for it to get them back online — is the issue I'm currently working to resolve. So please stick around, either way! (y)
 
Perhaps I should purchase a proper SATA -> NVMe enclosure?
Do such adapters exist? I tried searching, but I only found M.2 SATA (NGFF) --> full-size SATA, and NVMe --> U.2. I haven't been able to find anything that actually adapts NVMe to SATA.
 
Do such adapters exist? I tried searching, but I only found M.2 SATA (NGFF) --> full-size SATA, and NVMe --> U.2. I haven't been able to find anything that actually adapts NVMe to SATA.

From memory I believe there is a relation between the physical chip connector key and drive tech but I may be wrong.

M key and M+B key drives are SATA based and B key is NVME based (or visa versa)

Generally you can only get an enclosure for 2/3 not 3/3 of the styles. I also think that some of the m.2 form factor sizing can change for SATA drives but the NVME ones appear to be the standard "long" size.
 
There are lots of form factor adapters, but I haven't been able to find any protocol converters.