Running Spinrite 6 on individual RAID drives

  • SpinRite v6.1 is Released!
    Guest:
    That's right. SpinRite v6.1 is finished and released. For the full story, please see this page in the "Pre-Release Announcements & Feedback" forum.
    /Steve.
  • Be sure to checkout “Tips & Tricks”
    Dear Guest Visitor → Once you register and log-in:

    This forum does not automatically send notices of new content. So if, for example, you would like to be notified by mail when Steve posts an update to his blog (or of any other specific activity anywhere else), you need to tell the system what to “Watch” for you. Please checkout the “Tips & Tricks” page for details about that... and other tips!

    /Steve.
  • Announcing “BootAble” – GRC's New Boot-Testing Freeware
    Please see the BootAble page at GRC for the whole story.
  • BootAble – FreeDOS boot testing freeware

    To obtain direct, low-level access to a system's mass storage drives, SpinRite runs under a GRC-customized version of FreeDOS which has been modified to add compatibility with all file systems. In order to run SpinRite it must first be possible to boot FreeDOS.

    GRC's “BootAble” freeware allows anyone to easily create BIOS-bootable media in order to workout and confirm the details of getting a machine to boot FreeDOS through a BIOS. Once the means of doing that has been determined, the media created by SpinRite can be booted and run in the same way.

    The participants here, who have taken the time to share their knowledge and experience, their successes and some frustrations with booting their computers into FreeDOS, have created a valuable knowledgebase which will benefit everyone who follows.

    You may click on the image to the right to obtain your own copy of BootAble. Then use the knowledge and experience documented here to boot your computer(s) into FreeDOS. And please do not hesitate to ask questions – nowhere else can better answers be found.

    (You may permanently close this reminder with the 'X' in the upper right.)


CSPea

Active member
Oct 14, 2020
25
0
England
This recent post by @SecretStasher
SpinRite 6.1 penetrating RAID?
nudged me into posing this related question of my own, below.

I wonder whether we can always safely 'repair' a drive that has been temporarily extracted from a RAID array, and then reintroduce the repaired drive into the same array without allowing - or expecting? - the RAID controller to trigger an array resilvering?

(Taking care in the above scenario to power-down/up the controller+drives etc. as appropriate. I'm not referring here to hot-swap drive situations.)

I know we are also not considering SCSI based RAID setups here on this Forum, but I still vividly recall (while I was maintaining Novell servers many moons ago) how absolutely unforgiving our company's Adaptec SCSI (hardware) RAID controllers were to any changes being made to the contents of their installed drives, while they - the controllers - weren't 'in control' of the drives.
If any 'outside-of-the-RAID' change was made to any part of any one of the drives then the whole RAID array would fail to mount or would fail to auto-rebuild.

So with today's SATA based controllers, I'm curious about how tolerant they actually are to changes that might be made 'off line' to one or more of the drives in an array; for example changes that Spinrite might make to a drive if we ask it (Spinrite) to go ahead and repair/refresh/recover one or more sector(s) on any single drive from an array?

Before I pose my two specific questions (labelled 1 and 2 at the end) I'll offer a hypothetical scenario with five assumptions in these paragraphs below, labelled A to E.

Please jump to the end of this post if you're in a hurry 😁

Assumption A.
My hypothetical RAID array appears to be functioning normally, and my RAID controller has successfully coped with, and has 'internally' corrected, one or more (let's say) transient disk-read errors.

Assumption B.
While running normally, my RAID controller with its sophisticated RAID algorithms, had maintained the live array's overall integrity by (re)calculating on-the-fly, and then also storing, the parity (or the checksums) of the data that was read from and written to the array's discrete drives.

Assumption C.
I decide to run Spinrite 6 on all of the drives in the array, so I arrange a tidy shut-down of the computer and its RAID controller and extract each physical drive in turn to be installed temporarily into my dedicated Spinrite computer.

Assumption D.
While Spinrite is running on my individual RAID drives, it finds and fixes at least one bad sector. In other words, Spinrite 'nudges' that particular drive to remap the bad sector(s) to the drive's still-good reserve sector(s).

Assumption E.
I then reassemble and power-up my RAID array using those same Spinrite-scanned drives.

Here are my questions:

Question 1:
Will every RAID controller on today's market tolerate and accomodate the remapping of sectors on one or more of its drives if that sector-remapping occurred while the drives were not being 'live-maintained', on-the-fly by that controller, along with its probably fine-tuned RAID/parity algorithms?

Question 2:
Wouldn't it be wise or might it even be necessary with some controllers to run Spinrite on one drive at a time, then - for any drive on which Spinrite signalled an 'R' ('Repaired') - to then reassemble the RAID array and allow (or trigger?) the controller to run a full re-silvering - i.e. to recalculating the array's overall parity / sumcheck tables etc. - before repeating the Spinrite cycle on the other drive(s)?

I'd welcome your thoughts on this? Or perhaps some informed insights on how tolerant the current, SATA-based RAID controllers (hardware or software RAID controllers) are to 'external' changes to the sector geometry on their individual drives?

In my case, given that I don't know the answer to my own Question 1, my strong instinct would be to answer YES to Question 2.

In other words I would subject only one of my RAID drives at a time to a Spinrite scan, and I'd wait - each time I reassembled the array - to see if my RAID controller deemed it necessary to trigger an array resilvering. Just to be maximally prudent.

Colin P. (A different Colin P.)
 
Well the long and the short of the issue is that RAID arrays are special beasts. The value of the array is in the physical disks, because RAID is not a backup. So, if you pull a disk out, and ANY problem is reported, I would mark that disk for replacement. The problem would be, you might have more disks with problems than you can afford to replace at once. (If you have RAID-5, you can only afford one disk to fail.) So you'd finish running all the disks, and hope you only found one problem disk. If you found more problems than your array can withstand, now you have a very serious problem.
 
The thing is that SR will reallocate a failing sector using a spare on the drive, and the spare will get the same LBA as the one it replaces, so to the controller it will just appear as if that LBA takes a little longer to load, or with caching might not even notice it. Only on a MSDOS CHS structure, using a FM, MFM or RLL controller, will a bad sector be reallocated visibly, as there the computer handles the bad block table, not the drive. IDE and SATA, along with the later derivatives, and SCSI, have the drive itself handle bad sectors, and they will do a transparent reallocation. So very likely your RAID controller will not notice at all that the drives have been checked, providing they go back in the exact same positions and connections to the controller.
 
(....) RAID arrays are special beasts. (....) So, if you pull a disk out, and ANY problem is reported, I would mark that disk for replacement. (....)
Thanks PHolder for your response.

Indeed I agree ... and in a real-life situation (unlike my hypothetical one above) I reckon I would opt to replace any drive that Spinrite flagged as 'Repaired', rather than risk adding further stress to it by reinstalling it in the array and having it participate in a multiple-hour long resilvering session.

(I've had an expensive (new unused, 'spare') 2TB WD2002FFSX drive sitting idle on top of my gracefully-aging Synology DS412+ running RAID 5 for over 2 years now, ready to install if needed in this exact situation. But of course my main concern now is whether the remaining three drives in this RAID 5 array - running lightly-loaded since 2013 - would themselves withstand a multiple-hour long resilvering task!! Probably not!)

Coming back to Spinrite: Your point has been acknowledged many times I think over the years in Spinrite discussions, namely that Spinrite can rescue us from a crisis, but we should consider any 'repairs' it makes as important diagnostic clues, not as necessarily as 'magical' permanent fixes 😁
 
The thing is that SR will reallocate a failing sector using a spare on the drive (....) So very likely your RAID controller will not notice at all that the drives have been checked, providing they go back in the exact same positions and connections to the controller.
Thanks also SeanBZA for your response.

My head agrees with those facts you've outlined, but - if the event actually arose and I needed to rely on them - I'd have to persuade my guts to shut up and let me get the job done 😁

Once upon a time a lot of us could claim a working knowledge of the innards & algorithms of all the computer hardware we relied upon, but of course that's no longer possible, and one way of coping with uncertain situations (like my hypothesis above) is to play it super-safe, and assume the worst ... and/or consult some folks like yourselves here for reassurance and clarity 😋

Thanks again.
 
(I've had an expensive (new unused, 'spare') 2TB WD2002FFSX drive sitting idle on top of my gracefully-aging Synology DS412+ running RAID 5 for over 2 years now, ready to install if needed in this exact situation. But of course my main concern now is whether the remaining three drives in this RAID 5 array - running lightly-loaded since 2013 - would themselves withstand a multiple-hour long resilvering task!! Probably not!)
If I was in that situation, I think that I would probably put the spare drive into a enclosure and use it to perform a full backup of the NAS and put it somewhere safe.
 
If I was in that situation, I think that I would probably put the spare drive into a enclosure and use it to perform a full backup of the NAS and put it somewhere safe.
Thanks AlanD.
I've already got fallbacks in place 🙂

(Stepping off-topic briefly ...)
In 2020 I added a mirrored pair of 12TB drives to my already running, mirrored 4TB drives in my self-built FreeNAS (now TrueNAS) server, sitting alongside my old DS412+.

Using SyncBackPro I maintain what I call an 'overnight echo' of the non-system data on my trusty ol' Synology box, onto that 12TB TrueNAS Pool, and during the day I run ad-hoc SyncBackPro backups of changed spreadsheets and emails etc.

All complemented by selected overnight
backups to Amazon S3, plus bi-weekly Macrium Reflect PC backups (2 PCs), plus weekly rotating Macrium Reflect backups to offsite 4TB drives (transported to & from the other address in padded containers).

Yes, I admit that in my 'organically evolved' backup regime I've probably got a couple of backup blindspots, but I feel adequately covered.

I've been searching for a while now for my next candidate backup server ... ideally something with a SATA extender, 6 bays at least, so that I can re-purpose my stack of well-used but far-from-dead ('enterprise grade') 2TB drives. I lament Synology's recent policy change on accepting only 'Synology approved' drives for their servers, otherwise I'd already have ordered their DS1612+ (or similar), given how remarkably well my old DS412+ has performed. But now instead I'm contemplating another self-build server, maybe running 'UNRAID' as the server OS, or of course maybe using TrueNAS again. The problem is juggling the too-rare hardware options (N-bay enclosures plus suitable top-notch motherboards).

(End of detour ...)