We have an industrial controller that sends programs to various lines in an electronics facility. A few weeks ago, the controller (which runs Win10) failed to boot. As I was diagnosing the system, I noticed the drive was visible in the UEFI BIOS, and when I ran the built-in Dell Diagnostics, the drive passed its test. Turns out the system had no legacy mode and I was unable to run SpinRite on it, so I pulled the drive (it was a Seagate 2.5" mechanical HDD) and connected it to my SpinRite system. The drive was in really bad shape. SMART stats were plummeting horribly as I began a level 2 scan (the Spare Sector SMART category at one point became a negative value!). It took an entire weekend to get to 3% on the drive at which point it hit a section that kept triggering Command Timeouts, forcing me to shut the system down, reboot into SR and continue where it left off. In the meantime, we contacted the company for support. They got us back up by using another machine in their system as a temporary server, then they told us they'd replace the system under warranty. Apparently the server is configured from the factory and they can't or won't configure the same system with a new drive/OS. They just replace the whole Dell PC.
Anyway, the failing drive had only 25,000 POH, and the system was about 3 years old, so I commend them for replacing the whole unit under warranty. The problem is they are in Europe and we are in the States, so the replacement may be several weeks, if not months from arriving. In the meantime, we're running the surrogate server and keeping our fingers crossed. But I wasn't done yet.
After about a week of running SpinRite, it was still in the 3% range with Dynastat constantly working. SR did repair several regions between 0% and 2.9%. I then noticed that most of the failed sectors being pounded on by SR in the >3% range had no data, so I decided to cancel out of SR, hoping I had recovered enough data.
So, I then unleashed Clonezilla on the drive and instructed it to run in Rescue Mode (ignoring bad sectors). Once it hit the region with all the bad sectors, the cloning process froze to a crawl, but I had nothing to lose, so I just let it go. Seven days later, it finally finished. So I restored that image to a new Samsung SSD, installed it in the system, and reconnected it. It booted into Win10 without issues! Since we don't want anymore downtime of reconfiguring from the surrogate back to the original server, we're leaving it be and still waiting for our warranty replacement server. When it arrives, then we will have a spare server waiting in the wings if the system ever goes down again.
I was hoping to provide a testimonial that said we were back up and running in hours, but hey, it still worked! And we now have a spare server ready to rock and roll.
Thanks, Steve!
Edit: the whole point of my topic title (the "just long enough" part) was to say that once I verified the image was working, I plugged the bad drive back in to let SR work on it some more and the BIOS no longer sees the drive. So I did rescue it just in the nick of time!
Anyway, the failing drive had only 25,000 POH, and the system was about 3 years old, so I commend them for replacing the whole unit under warranty. The problem is they are in Europe and we are in the States, so the replacement may be several weeks, if not months from arriving. In the meantime, we're running the surrogate server and keeping our fingers crossed. But I wasn't done yet.
After about a week of running SpinRite, it was still in the 3% range with Dynastat constantly working. SR did repair several regions between 0% and 2.9%. I then noticed that most of the failed sectors being pounded on by SR in the >3% range had no data, so I decided to cancel out of SR, hoping I had recovered enough data.
So, I then unleashed Clonezilla on the drive and instructed it to run in Rescue Mode (ignoring bad sectors). Once it hit the region with all the bad sectors, the cloning process froze to a crawl, but I had nothing to lose, so I just let it go. Seven days later, it finally finished. So I restored that image to a new Samsung SSD, installed it in the system, and reconnected it. It booted into Win10 without issues! Since we don't want anymore downtime of reconfiguring from the surrogate back to the original server, we're leaving it be and still waiting for our warranty replacement server. When it arrives, then we will have a spare server waiting in the wings if the system ever goes down again.
I was hoping to provide a testimonial that said we were back up and running in hours, but hey, it still worked! And we now have a spare server ready to rock and roll.
Thanks, Steve!
Edit: the whole point of my topic title (the "just long enough" part) was to say that once I verified the image was working, I plugged the bad drive back in to let SR work on it some more and the BIOS no longer sees the drive. So I did rescue it just in the nick of time!
Last edited: