Export thread

  • DNS Benchmark v2 is Finished and Available!
    Guest:
    That's right. It took an entire year, but the result far more accurate and feature laden than we originally planned. The world now has a universal, multi-protocol, super-accurate, DNS resolver performance-measuring tool. This major second version is not free. But the deal is, purchase it once for $9.95 and you own it — and it's entire future — without ever being asked to pay anything more. For an overview list of features and more, please see The DNS Benchmark page at GRC. If you decide to make it your own, thanks in advance. It's a piece of work I'm proud to offer for sale. And if you should have any questions, many of the people who have been using and testing it throughout the past year often hang out here.
    /Steve.
  • Be sure to checkout “Tips & Tricks”
    Dear Guest Visitor → Once you register and log-in please checkout the “Tips & Tricks” page for some very handy tips!

    /Steve.
  • BootAble – FreeDOS boot testing freeware

    To obtain direct, low-level access to a system's mass storage drives, SpinRite runs under a GRC-customized version of FreeDOS which has been modified to add compatibility with all file systems. In order to run SpinRite it must first be possible to boot FreeDOS.

    GRC's “BootAble” freeware allows anyone to easily create BIOS-bootable media in order to workout and confirm the details of getting a machine to boot FreeDOS through a BIOS. Once the means of doing that has been determined, the media created by SpinRite can be booted and run in the same way.

    The participants here, who have taken the time to share their knowledge and experience, their successes and some frustrations with booting their computers into FreeDOS, have created a valuable knowledgebase which will benefit everyone who follows.

    You may click on the image to the right to obtain your own copy of BootAble. Then use the knowledge and experience documented here to boot your computer(s) into FreeDOS. And please do not hesitate to ask questions – nowhere else can better answers be found.

    (You may permanently close this reminder with the 'X' in the upper right.)

Is there a point at which slow SpinRite performance means the drive is bad?

#1

brado2049

brado2049

I am not new to SpinRite, but new to successful SpinRite runs — long story, hardware config challenges. I am past those challenges, and now running Level 3s on a bunch of 3TB and 4TB hard drives of mine. My first run was on a Toshiba 4TB SATA drive — it took about 12-13 hours to complete had consistent progress throughout. But now I’m running a Level 3 on a 3TB Seagate Barracuda SATA drive, and the thing has been running over 15 hours and only 1.2% complete. It is estimated over 1195 hours (49+ days) remaining. Questions:
  1. Is this kind of performance normal for SpinRite, a month-and-a-half for a 3TB drive?
  2. Is there a point at which you should kill the current run of SpinRite, and try running again (perhaps with different options)?
  3. How do you know a hard drive is just bad and needs to be thrown away?
In general, I’m trying to understand when I’m within the realm of normal SpinRite behavior and should let things continue to run, or when I’m seeing something abnormal and should do….???

Thanks so much for your help!

Attachments


  • IMG_0758.jpeg
    IMG_0758.jpeg
    136 KB · Views: 117

#2

ColbyBouma

ColbyBouma

That drive is definitely struggling. If you scroll through the screens (left and right arrow), it'll give you more information such as number of retries.


#3

brado2049

brado2049

@ColbyBouma - I’m not sure exactly what I’m looking for, but here are screenshots of all the screens.

Attachments


  • IMG_0759.jpeg
    IMG_0759.jpeg
    125.4 KB · Views: 132
  • IMG_0760.jpeg
    IMG_0760.jpeg
    160.3 KB · Views: 111
  • IMG_0761.jpeg
    IMG_0761.jpeg
    146.5 KB · Views: 128
  • IMG_0762.jpeg
    IMG_0762.jpeg
    104 KB · Views: 127
  • IMG_0763.jpeg
    IMG_0763.jpeg
    158.7 KB · Views: 114
  • IMG_0764.jpeg
    IMG_0764.jpeg
    165.1 KB · Views: 108
  • IMG_0765.jpeg
    IMG_0765.jpeg
    97.2 KB · Views: 123

#4

ColbyBouma

ColbyBouma

Thank you. If you have any important data on that drive, stop SpinRite and try to make a copy of the data.
Here are some things I noticed:
  1. The Real-Time Activities page has 8 counters. On healthy drives, all of those are usually 0. This drive has 4 counters above 0, the most worrying of them being "not recoverable". That means SpinRite tried to recover data from a sector, but was unable to get 100% of the data, so it gave up and overwrote that sector with whatever it was able to find.
  2. The yellow "Waiting for drive" in the top-left corner of several screenshots means SpinRite sent a command to the drive, but it hasn't responded yet. Healthy drives never do this.
  3. The empty S.M.A.R.T. System Monitor page is unusual.
  4. The ST3000DM001 is known to have exceptionally high failure rates. https://en.wikipedia.org/wiki/ST3000DM001


#5

peterblaise

peterblaise

Right on @ColbyBouma!

- - - - -

After backup, and especially if backup fails to get everything:

SPINRITE NORAMTEST LEVEL 5 DYNASTAT 1 NOREWRITE

... just to see if the drive itself is recoverable.

It's probably a test drive from here on out.

- - - - -

Useful command line options:

DYNASTAT 0 recovers nothing, good for blankable drives to 'quickly'
plow through with a surface integrity refresh.

DYNASTAT 1 prevents SpinRite from wasting 4 extra minutes
recovering nothing additional.

NOREWRITE prevents SpinRite from writing zeros in unrecoverable
areas so you can revisit and try again by any other means.

NORAMTEST bypasses a quirk where drive enumeration produces
ghosts, not your problem here, but who knows?

SKIPVERIFY bypasses pre-test before actual Level testing, not your
problem here, but good to know just in case the drive stops being
recognized by SpinRite at all.

- - - - -

Thanks for following up, and let us know how that drive behaves next.


#6

brado2049

brado2049

@ColbyBouma @peterblaise — thanks so much for your responses! Additional status — I kept it running overnight, and a day later, not much progress. I’m attaching another set of screenshots from this morning showing pretty much the same state of affairs. A note on this drive (and all of my drives) — I have no data I want to recover at all on any of them. A repair and refresh is what I need. @peterblaise ’s command line appears to be a good next step, so I’ll kick that off. But I noticed this was a Level 5. It got me to wondering — if I want to recover no data at all, but repair and refresh a drive, what are the options I should be using?

One last note — I’m using a ZimaBoard with the SATA-Y cable and two drives connected, and I set SpinRite running on two SATA drives and the first 4TB one finished fine. This drive I’m having problems with is the second one. I checked the Zima web site and the doc on the SATA-Y cable and it is supposed to be compatible with the ZimaBoard, but I wondered if may I ought to revert back to the single SATA cable which came with the ZimaBoard just in case there was possibly some power fluctuation being introduced b/c the Y cable was powering two drives instead of one. Has anyone had any problems or behavioral differences when using the SATA-Y cable and two drives vs just a single drive?

Ok…I’m of to restart with @peterblaise ’s options. More status as it comes…thx so much for your help!

Attachments


  • IMG_0766.jpeg
    IMG_0766.jpeg
    143 KB · Views: 156
  • IMG_0767.jpeg
    IMG_0767.jpeg
    166.5 KB · Views: 98
  • IMG_0768.jpeg
    IMG_0768.jpeg
    167.9 KB · Views: 92
  • IMG_0769.jpeg
    IMG_0769.jpeg
    100.7 KB · Views: 98
  • IMG_0770.jpeg
    IMG_0770.jpeg
    173.7 KB · Views: 94
  • IMG_0771.jpeg
    IMG_0771.jpeg
    186.1 KB · Views: 91
  • IMG_0772.jpeg
    IMG_0772.jpeg
    99.2 KB · Views: 99

#7

P

PHolder

You're potentially investing more personal time in a drive that is not worth the investment. If you envision your time is worth as little as $10/hr you could reinvest that money in buying a new reliable drive. Unless this drive has emotional or sentimental value to you, it's time for it to permanently retire.


#8

Tazz

Tazz

if I want to recover no data at all, but repair and refresh a drive, what are the options I should be using?
Personally I would run a
Code:
spinrite /dynastat 0
Level 4. Level 5 if time does not matter.
If time does matter then a Level 3 would give a good idea of how bad it is.

Also, if it's testing bad at the beginning of the disk, start SpinRite at the 50% mark and then go back to get the first part later.


#9

brado2049

brado2049

@PHolder — you make a good point — the value of time. That’s actually what’s behind these questions, which aren’t so much about this one apparently faulty drive (that’s just made for a context to ask them), but rather understanding SpinRite and its behavior and output. Mastered once, that’ll carry me indefinitely.

The bigger picture is this — I rarely (pretty much never) have data recovery needs due to drive failure. I run direct-attached RAID arrays which do continuous backups to a NAS RAID array. The only time I’ve ever lost data was inadvertently when migrating to a new drive array (my stupidity). But data loss from drive failure is a very low risk for me — I’d need to lose multiple drives on multiple arrays simultaneously for that to happen. Never had it happen in over three decades.

But what I do have are hard drives that fail occasionally. I have a stack of 13 SATA 3TB and 4TB hard drives (none are specifically for NAS) that I’ve pulled over the years when the array software indicated these drives were failing. So I’m running them through SpinRite now.

@Tazz — hey, great minds! After taking guidance from @peterblaise ’s post, I decided that I needed a point of reference with this stubborn 3TB Seagate drive — I have a stack of others, same drive model, so I gunned up another one doing a Level 3 Dynastat 0 and it immediately started speeding through faster than my Toshiba 4TB — estimated an 8 hour run (I’ve attached a screenshot).

So this takes me back to one of my original questions — how do you know when a drive is no good anymore? What is the specific criteria (or is there)? I have a stack of these to plow through now, a little wiser than when I started (thanks to the folks here on the forums — thanks everyone!) Hopefully, most of these will be just start / set watch alarm / return and start the next. But it would help to know exactly when to consider a drive dead.

Thanks!

Attachments


  • IMG_0774.jpeg
    IMG_0774.jpeg
    104.7 KB · Views: 89

#10

P

PHolder

how do you know when a drive is no good anymore?
That's a statistics problem hiding in sheep's clothing ;) In theory you could get a brand new drive plug it in, and your system could get fried by a lightning strike 5 seconds after you wrote some important file to the drive for the first time. Statistically that probably almost never happens, but my point... if I even have one... is that a drive is worth exactly as much trust as you have in it retaining your important data for a lengthy enough time to meet your expectations. I gather that's not very helpful to you, but I think, it's all I got. If a drive is acting wonky, or unreliable, you're setting yourself up for hurt if you put too much trust in it.


#11

brado2049

brado2049

@PHolder - thanks for the reply. Forgive me for not being more clear. The context of the question is relative to SpinRite use. Question restated: Under what conditions is it worthwhile to continue trying alternative options and/or continuing to run SpinRite on a drive vs. just canceling / exiting out of SpinRite and tossing a drive in the trash? I imagine there must be some statistic, count of errors, timeouts, length of time running, ...something... which makes it a reasonable conclusion that a drive is usable, and/or a reasonable conclusion that a drive is not usable. If there is no threshold or standard, and it is all governed by The Force, then I am asking any SpinRite Jedis out there for your own personal sense of it when you run SpinRite: what criteria do you use to determine a drive can be saved and continued to be used vs. it is dead?


#12

D

DanR

If there is no threshold or standard, and it is all governed by The Force, then I am asking any SpinRite Jedis out there for your own personal sense of it when you run SpinRite: what criteria do you use to determine a drive can be saved and continued to be used vs. it is dead?
Unfortunately, there is no good/bad black/white yes/no answer. It comes down to a combination of judgement, common sense, and experience.

If SpinRite gets stuck and unable to move on -- that is bad.

If there any U's or B's in the Graphics Display Screen - that is bad. If there are only some R's that could be OK as it indicates SR recovered the data.

If there are errors at the bottom of the Real Time Activities screen - that is bad! Good drives will have zero errors; bad drives lots of errors. And then there is the gray area in between . . . :)

If the Detailed Technical Log screen has a very small scroll bar over on the RH side, that is bad! It means there is a LOT of data (likely error data) in the DTL to scroll thru - bad.

If the S.M.A.R.T. data sccreen is blank - that is bad.

The drive you initially posted about is bad. It is beyond SpinRite's ability to "fix" it. SpinRite can do things that seem like magic. But SR does have its limits.

In regard to the drive in your OP, there are a couple of things you might try for info and experience - if you wish to spend the time.

1) Try restarting at 2%, or 5%, etc to get past the initial trouble spot and see what happens

2) Try a DynaStat 0 run. Command line: "spinrite level 2 dynastat 0" and see what happens.

Bottom line: The drive in your OP is toast. It cannot be fixed by SpinRite. It is not to be used ever again. But as a learning tool for playing around . . .


#13

brado2049

brado2049

@DanR -- great post, thanks! That gives me a good guideline. A few of those were the mental notes I had gleaned from comments throughout the thread. I indeed had concluded the same about the original drive in question, especially after being able to compare it to the SpinRite performance and output on other drives of the same model.

Dare I ask the logical follow-up -- to what lengths is everyone going to destroy bad hard drives, just the trash can, sledgehammer, drilling holes, or thermite? LOL....can't wait to hear...Thanks!


#14

Tazz

Tazz

how do you know when a drive is no good anymore?
Adding to what the others have said, if the SMART info is accessible - Reallocated Sectors and Pending Reallocated Sectors are usually the beginning of something that doesn't get better. The rest of the SMART stuff is vendor specific on whether or not the numbers mean what it looks like they mean - large numbers don't necessarily mean bad things.

Dare I ask the logical follow-up -- to what lengths is everyone going to destroy bad hard drives, just the trash can, sledgehammer, drilling holes, or thermite? LOL....can't wait to hear...Thanks!
I like to take them apart and pull the magnets out but I'm not allowed to do that anymore because "You have too many that's taking up too much space and they're all stuck in a jumbled ball." - my wife.

Break the PCB and cut the ribbon cable then pound a nail down through the top should be good enough unless you think someone is trying to gather intelligence on you.

Just do something so that the average person who may find it and plug it in can't get access to it. Again, unless you have a stalker.


#15

brado2049

brado2049

@Tazz -- brilliant, thanks!


#16

brado2049

brado2049

Ok @DanR (or anyone), I have another good example to get your read on things. I am including another set of screenshots on a different drive, same model (Seagate Barracuda 3TB) as the first one, and here’s what I see:
  • On the Graphic Display Screen, shows SpinRite has been running 3.5 hours, estimates 95 more hours.
  • No errors shown on the Real-Time Activities screen.
  • On the S.M.A.R.T. System Monitor screen, there appear to be some errors (uncorrectable).
So, very slow SpinRite processing, and it appears some uncorrectable errors (if I’m reading that right). @DanR (or anyone), what’s your take on this drive? If the estimates are right, 95 hours is ~4 days — if this is your hard drive, what do you conclude and do? Is this drive likely salvageable and worth completing, or does this drive belong alongside the lost Atari 2600 E.T. cartridges in a New Mexico landfill? What say you? (yes, channeling Aragorn from the LOTR movies… :-D )

Attachments


  • IMG_0776.jpeg
    IMG_0776.jpeg
    134.4 KB · Views: 90
  • IMG_0777.jpeg
    IMG_0777.jpeg
    174.5 KB · Views: 103
  • IMG_0778.jpeg
    IMG_0778.jpeg
    156.8 KB · Views: 95
  • IMG_0779.jpeg
    IMG_0779.jpeg
    153.4 KB · Views: 87
  • IMG_0780.jpeg
    IMG_0780.jpeg
    161.1 KB · Views: 87
  • IMG_0781.jpeg
    IMG_0781.jpeg
    173.6 KB · Views: 89
  • IMG_0782.jpeg
    IMG_0782.jpeg
    95.3 KB · Views: 97

#17

P

PHolder

Seagate Barracuda 3TB
From the model # on your screen cap: https://en.wikipedia.org/wiki/ST3000DM001
Backblaze, a remote backup service company, observed that its ST3000DM001 drives have failed at rates far higher than the average of other hard drives. Only 251 of the 4,190 ST3000DM001 hard drives placed in service in 2012 were still in service as of 31 March 2015.


#18

brado2049

brado2049

@PHolder — yep, @ColbyBouma mentioned that earlier in this thread. I’ve still got some working ones though, two of the four I’ve run SpinRite on so far have completed fine. One we know is bad, the other is still running and the subject of my most recent set of screenshots above. The general question on that drive still holds — does the data indicate a drive that isn’t worth pursuing, or one that still may be corrected?


#19

peterblaise

peterblaise

Sadly, Backblaze has no published standards by which they replace a
drive or call a drive 'failed', that is, if they replace a drive because of
S.M.A.R.T. or other reports, WITHOUT FAILURE, or if the drive
actually becomes unable to read ( and write ) data, and then and only
then it is replaced <-- not likely.

I presume Backblase replaces drives BEFORE actual failure, and may
never retest after that, and as such, they may never know if the drive
would have eventually failed, or not.

As such, Backblaze's recommendations are most appropriate for large,
multiple-installation, financially ROI-based return on investment
organizations looking for assurance that they are avoiding LIKELY
risks, but they may not be reporting individual REALIZED risks.

I'd love to test Backblaze's deprecated drives to see if some are
perfectly fine by my standards, or if they really, really, really are a
waste of time and energy because critical failure is imminent or has
already happened.

Who knows?

- - - - -

So for those of us out here working with onsies and twosies, not a
server room of thousands of drives, we gotta deal with maybe the
x% of the drives that prove reliable, versus the y% that have failed.

There's no way to correlate our drive to Backblaze's report.

So, what do we do?

I ardently and continuously WATCH the

SpinRite 6.1 [ Real-Time Activities ]

... screen to see if any region or sector is slow to read or write:

1753877194774.png


Illustrated ( perhaps subtly? ) here:


And, of course, watch the events enumerated on that screen:

command timeout: 0 command aborted: 0
comm/cable errs: 0 not recoverable: 0
minor troubles: 0 sect neverfound: 0
dynastat recovr: 0 defective sectr: 0

... and toggling over to the [ Detailed Technical Log ] screen, for
example:

1753877657310.png


And through those views, circumnavigate the drive's read and write
performance.

Plus, listen to the drive, watch it's temperature, and so on, to become
intimately familiar with 'normal' and acceptable behavior, so I know
what unacceptable behavior looks like, sounds like, and feels like.

And compare to other drives, other makers' drives, and other models.

Eventually, we become familiar with 'good' versus 'bad' regardless of
the success of write-read tests themselves - I have bad drives that
have no failures, they pass all tests, but some of these supposedly
'good' drives are so sloggy that I can't tolerate waiting for them
anymore.

SpinRite is not a pass / fail test for whether or not we will be happy
with a drive.

That's our responsibility to develop the skill to know the difference.

So test those drives, end-to-end, and mark 'em [ TEST ONLY ] for
crapppy ones, and press the winners into service.

Let us know what you do.

Thanks.


#20

D

DanR

@DanR (or anyone), what’s your take on this drive? If the estimates are right, 95 hours is ~4 days — if this is your hard drive, what do you conclude and do? Is this drive likely salvageable and worth completing, or does this drive belong alongside the lost Atari 2600 E.T. cartridges in a New Mexico landfill? What say you?
I would say the drive may be bad.
- SpinRite is clearly stuck
- The DTL screen shows a very small scroll bar on the right, indicating lots of data (errors?) in the log - NOT good
- The drive is a Seagate; hence some SMART errors are not unusual for a good drive
- However, the RTA screen shows NO errors at the bottom - Good!!!

I note the the RTA screen shows a sector size of 32768 which is the maximum, and quite possibly too aggressive for this no longer optimum and likely now sensitive drive. You might consider starting SpinRite with the XFER command line token with progressively smaller values for progressively gentler touches, to see if something might work. Another learning experience! :)

Examples:
spinrite xfer 16384
spinrite xfer 8192
spinrite xfer 4096
Etc.

The suggestions in my previous post, for skipping the bad spot or trying DynaStat 0, for learning and experience would apply here to this drive.
But try the XFER options first!


#21

brado2049

brado2049

I'm not going to post every drive's results, as I have a stack of them I'm working through. But I thought those who contributed guidance on this latest drive we were discussing might be interested in "the rest of the story". It also gives a good basis for asking a few final questions.

While I was tempted to kill SpinRite processing of this drive, I let it run. It took 4+ days, but it completed. I've attached the relevant screens showing status after completion. Points of interest:
  • Graphics Status Display: one Unrecovered area of storage.
  • Real-Time Activities: 14 command timeouts, 14 minor troubles
  • SMART System Monitor: ECC corrected 112/114, relocated sect 76/77, uncorrectable 84/100, pending sectors 84/100.
I've read the doc on all of these, and the SMART doc too. Based on the above, what would your verdict on this drive be? Would running SpinRite again on it potentially clean it up further to a reliable state?

One last question about SpinRite usage: is it possible to rerun SpinRite on only the specific problem areas only as indicated by these results, and ignore the rest of the drive which tested fine?

Thanks so much for your help!

Attachments


  • IMG_0804.jpeg
    IMG_0804.jpeg
    84.8 KB · Views: 98
  • IMG_0805.jpeg
    IMG_0805.jpeg
    48.2 KB · Views: 90
  • IMG_0808.jpeg
    IMG_0808.jpeg
    58.3 KB · Views: 114

#22

P

PHolder

is it possible to rerun SpinRite on only the specific problem areas only as indicated by these results
Sure, in a manner of speaking. You should have the logs, which should give you LBA #'s (or more likely a suitable range) and you can specify the range on the command line or in the UI. (On the screen just before final confirmation, there should be a note on what to press to edit the range.. it used to be Shift-Enter, but it changed in 6.1 toward the end of the development, and I forget the new sequence... might be TAB.) See the FAQ on the site for the command line options if you want to go that route. https://www.grc.com/sr/faq.htm Search for "What are SpinRite’s command line options?"


#23

D

DanR

One last question about SpinRite usage: is it possible to rerun SpinRite on only the specific problem areas only as indicated by these results, and ignore the rest of the drive which tested fine?
Yes! One way is via the command line as pholder noted.

Another way: When you get to the "Before Beginning" screen do not press Enter. Press TAB instead. This takes you to a screen where you may specify starting and stopping points for a more surgical SpinRite run.

Of course, either way you need to know where to start and where to stop. This can be either by a percentage or by sector number. This information is typically found in the SRLOG file. It is also specified on screen by SpinRite when a run is aborted in progress.


#24

brado2049

brado2049

Update -- I have made my way through all 13 of my drives, and here are the first-pass totals:
  • 7 good (mostly, minor stuff)
  • 3 bad, either outright unreadable, or estimating months to process and letting it run 30-60 min kept that high estimate (I exited SpinRite).
  • 3 iffy multi-day processing times, moderate errors.
Thank you everyone for your guidance on running SpinRite and reading the results. 7 of those drives are going back into circulation in a new QNAP DAS, I'll hold a funeral for the 3 bad ones, and the 3 iffy ones are being experimented with @DanR 's XFER guidance (4096). Thanks to the helpful pointers, I'm starting to get the feel of all this. These last XFER tests will probably take a couple weeks, but hopefully in the near future I can get a blog post out. If XFER and using these drives in the QNAP array produce any interesting nuggets, I'll post here.

This has been a great exercise on a great tool -- thanks again to all who have contributed.


#25

peterblaise

peterblaise

Re" ... [ SpinRite ] estimating months to process ..."​

2 things:

For Data Recovery, try DYNASTAT 1 to reduce SpinRite spending
5 minutes on every unreadable sector:

SPINRITE NORAMTEST DYNASTAT 1

For Drive Maintenance, try DYNASTAT 0 to eliminate data
recovery altogether.

SPINRITE NORAMTEST DYNASTAT 0

Level 2, 3, 4, or 5, you decide how thorough a test you want to throw
at anything.

Those might speed through drives for reuse.

That being said, I have let SpinRite have at it for 2 weeks, and I was
pleased to discover that it recovered everything, even the drive itself.

Let us know if that helps make the worst drives usable again.


#26

brado2049

brado2049

@peterblaise — thanks for the reply. I have no need for data recovery at all. For the 13 drives I mentioned that I ran SpinRite on, I used this command:

SPINRITE NORAMTEST LEVEL 3 DYNASTAT 0 NOREWRITE

For the 3 iffy drives out of those 13 after one run (moderate issues), I’m now running the following on those:

SPINRITE NORAMTEST LEVEL 3 DYNASTAT 0 NOREWRITE XFER 4096

Thought this does bring up another question I have. Why when specifying the “LEVEL 3” on the command line, does SpinRite still ask you to select the Level you want to run? It appears specifying the Level on the command line has no effect….


#27

brado2049

brado2049

I hadn’t thought to make this thread longer, but as a question from the new circumstances aligns with the original thread topic, I’ll put it here. So I have been running this on one of those “iffy” drives (moderate issues) for a few days — been running about 50 hours, but with an estimated remaining time of 350 hours. Been running with this command:

SPINRITE NORAMTEST LEVEL 3 DYNASTAT 0 NOREWRITE XFER 4096

I got to thinking about it — what does it mean when you have a drive that’s processing very slowly (relatively speaking), but you aren’t getting any (or many) errors? Is it possible that such a drive will be cleaned up and returned to normal use? Or is it garbage regardless of lack of errors?


#28

Tazz

Tazz

I got to thinking about it — what does it mean when you have a drive that’s processing very slowly (relatively speaking), but you aren’t getting any (or many) errors? Is it possible that such a drive will be cleaned up and returned to normal use? Or is it garbage regardless of lack of errors?
Personally I would take note of the slow area then re-run SR over the same area a few times on level 3-5 to see if it's still slow or if working the platters helped in cleaning it up. Then decide.


#29

D

DanR

Thought this does bring up another question I have. Why when specifying the “LEVEL 3” on the command line, does SpinRite still ask you to select the Level you want to run? It appears specifying the Level on the command line has no effect….
In this case, Level 3 has been selected (via the command line) and will show at the very top center of the Level Selection screen.
SpinRite is merely giving you the option of changing your mind if a different level is desired.

BTW: You could also try XFER 2048 or XFER 1024, for example. However, progress could be slower as SpinRite's "bites" now involve fewer bytes. :)


#30

brado2049

brado2049

Personally I would take note of the slow area then re-run SR over the same area a few times on level 3-5 to see if it's still slow or if working the platters helped in cleaning it up. Then decide.
Best I can tell, there’s no area that’s been slower or faster than the others. All processing of the drive has been slow. The original estimate for drive processing time was around 400 hours, and the rate of processing hasn’t changed substantially after 50 hours. That’s really the question, what does it mean if processing of the entire drive is slow, but there isn’t much in the way of errors?


#31

A

AlanD

Best I can tell, there’s no area that’s been slower or faster than the others. All processing of the drive has been slow. The original estimate for drive processing time was around 400 hours, and the rate of processing hasn’t changed substantially after 50 hours. That’s really the question, what does it mean if processing of the entire drive is slow, but there isn’t much in the way of errors?
If this is the drive in the pictures above, which shows a number of cabling errors, and it is consistently slow in SR, perhaps there are REAL cabling errors ( bad contacts or damaged traces). Have the SMART stats changed significantly during the run?


#32

brado2049

brado2049

If this is the drive in the pictures above, which shows a number of cabling errors, and it is consistently slow in SR, perhaps there are REAL cabling errors ( bad contacts or damaged traces). Have the SMART stats changed significantly during the run?
This is not that same drive. This is one of the other “iffy” drives that either had moderate errors or were processing extremely slowly in my first pass through the 13 drives I had, so I exited SpinRite processing and set it aside for a second pass with different parameters. This is where my question is arising from — basically it is a drive that is processing extremely slowly — original estimate ~400 hours, and several days in, that rate has proven consistently true. However, there are almost no errors that have arisen. I am attaching the screenshots of the current status, and you can see:

  • Graphic Status Display: no problems
  • Real-Time Activities: no errors
  • SMART System Monitor: ECC corrected - one red square (but notice that has decreased from three as SpinRite processing has proceeded)
Hence, the question — what should be concluded when the drive processing is agonizingly slow, but comes up with no (or few) errors? What does that mean? Does it mean there are no errors but actual read/write performance will make the drive pragmatically unusable? Or does it mean the drive is fine for normal use, but the SpinRite exercises were very slow for some reason that won’t be material to normal usage?

Thanks for your guidance!

Attachments


  • IMG_0832.jpeg
    IMG_0832.jpeg
    155 KB · Views: 90
  • IMG_0833.jpeg
    IMG_0833.jpeg
    169.4 KB · Views: 83
  • IMG_0834.jpeg
    IMG_0834.jpeg
    154.4 KB · Views: 79
  • IMG_0835.jpeg
    IMG_0835.jpeg
    151.5 KB · Views: 78
  • IMG_0836.jpeg
    IMG_0836.jpeg
    90.8 KB · Views: 80

#33

A

AlanD

This is not that same drive.

  • Graphic Status Display: no problems
  • Real-Time Activities: no errors
  • SMART System Monitor: ECC corrected - one red square (but notice that has decreased from three as SpinRite processing has proceeded)
Hence, the question — what should be concluded when the drive processing is agonizingly slow, but comes up with no (or few) errors?
As you say, no errors are showing, although I do notice that you are only using 4096 bytes blocks. That will slow SR down, the AHCI driver should be able to process 32k blocks which should be 8 times faster.


#34

brado2049

brado2049

As you say, no errors are showing, although I do notice that you are only using 4096 bytes blocks. That will slow SR down, the AHCI driver should be able to process 32k blocks which should be 8 times faster.
My original run on all of my hard drives used 32k blocks. After that first pass, I had 7 good drives, 3 bad drives, and 3 “iffy” drives which either had moderate errors or were processing so slowly they were either bad or needed different parameters. So taking the guidance of another (@DanR) earlier in this thread, I switched to 4k blocks for a second run only on the 3 “iffy” drives.

This makes for an opportunity for a worthwhile clarification on what that 32k -> 4k change actually does. My understanding is this determines the size of the blocks SpinRite performs read-write I/O with, so larger blocks, faster drive processing, but will also report issues against that amount of drive space. However, if there are errors or slow I/O (I don’t know all the myriad reasons which can contribute) within the drive location a larger block addresses, switching to a smaller block size can help to isolate the problem to a smaller area of the drive, perhaps revealing that it is only a smaller area which is actually having problems.

So for example, let’s say I use 32k block size, and am returned errors. If I reprocess that 32k area of the disk using a 4k block size, that addresses that 32k block disk area in 8 different 4k blocks, and SpinRite may now be able to determine that, for example, the error exists only in 4k block 5, not the entire 32k block area. I would assume this would favorably change the final statistics on the drive as a whole, and ideally show the drive to have issues on a lesser area of the drive. What it also does is further isolate issues to smaller areas and more specific locations on the drive, making it possible to focus just on those areas for additional SpinRite runs if desired.

Have I got that right? (If not, please someone set me straight! :) ) Anyway, returning to the drive in question, on this second run using 4k blocks, there doesn’t seem to be any significant difference in processing speed (though I wasn’t directly comparing 32k vs 4k estimated times), they both were just extremely slow, so I’m just letting this current 4k run complete, if it can. But the lack of much in the way of errors has got me thinking about the proper conclusion — what happens if this thing finally completes, and there are no (or very few) errors, but the drive processing was agnonizingly slow (which if stopped right now, would be the case)? Is that drive good to put back into use? Or is it bad, and should be destroyed with the other dead drives? Part of the idea of what I’m asking is if there’s a scenario where the drive is otherwise fine and will perform fine during normal use, but there’s something about the nature of SpinRite operations which on some drives just manifests as extremely slow processing, but doesn’t necessarily mean drive dysfunction. So in other words, dog-slow SpinRite processing, but the drive is still good. Can anyone speak to this?


#35

P

PHolder

what that 32k -> 4k change actually does.
Your understanding may be a little off. The request size is always a multiple of 512 bytes, as that is the size of a LBA. So really what is changing is the number of LBAs that are being requested from the drive at once. Assuming all of them succeed, this just means SpinRite can go faster because the overhead per LBA is lower. (The drive's internal code presumably handles optimizations like knowing it needs more sequential blocks, so it arranges to optimize the reading and communicating with the PC.)

However, once a problem occurs, SpinRite has to abandon all of the big block work and get down to work on that specific LBA. It's going to keep retrying it. At this point, it's all up to the drive and its internal processes, and speed is no longer the main concern. So, while it may affect drive behaviour, because many things can, and we don't know what the firmware is coded like, changing the size of reads shouldn't affect drive reliability... but during my beta testing of SpinRite, I found a drive that did vary its behaviour based on block size, and I think that was the genesis of Steve providing the command line option to override SpinRite's automatic logic.


#36

D

DanR

dog-slow SpinRite processing, but the drive is still good
Some questions:

Is the current 4K run the first pass Level 3 run? That is, you have not re-started SpinRite from the beginning for a second pass? And the Graphic Status Display (GSD) screen shows no blocks with R's, U's, or B's?

If the drive has not been used for a long time, the bit patterns on the platter surfaces will weaken over time and become progressively harder to read. SpinRite can be very patient and persistent in trying to read the sectors, taking lots of time if necessary.

A clean GSD screen suggests that the blocks were read successfully (no U's), DynaStat likely not needed (no R's), and the data was rewritten successfully (no B's), refreshing the bit patterns.

A second pass Level 2 run should then proceed at normal speed on a now refreshed., now good drive. If the second pass is still dog-slow, however, then the drive is bad.


#37

brado2049

brado2049

Your understanding may be a little off. The request size is always a multiple of 512 bytes, as that is the size of a LBA. So really what is changing is the number of LBAs that are being requested from the drive at once. Assuming all of them succeed, this just means SpinRite can go faster because the overhead per LBA is lower. (The drive's internal code presumably handles optimizations like knowing it needs more sequential blocks, so it arranges to optimize the reading and communicating with the PC.)

However, once a problem occurs, SpinRite has to abandon all of the big block work and get down to work on that specific LBA. It's going to keep retrying it. At this point, it's all up to the drive and its internal processes, and speed is no longer the main concern. So, while it may affect drive behaviour, because many things can, and we don't know what the firmware is coded like, changing the size of reads shouldn't affect drive reliability... but during my beta testing of SpinRite, I found a drive that did vary its behaviour based on block size, and I think that was the genesis of Steve providing the command line option to override SpinRite's automatic logic.
Thanks for the great reply! That is really interesting. If the XFER block size parameter was added to address the behavior which varied based on block size, then I suppose the actual net effect of that parameter value (beyond # of LBAs requested) lies in what the actual varied behavior you observed was. Do you mind expounding on that a little more, I’m really curious as to what it was. From a career doing software development, my blind guess would be something to do with optimizing I/O buffering and optimizations and possibly managing caching. I don’t know the internals, but those things absolutely can cause behavioral variances in systems. Fascinating stuff, if you can share more, would really welcome it.


#38

brado2049

brado2049

Some questions:

Is the current 4K run the first pass Level 3 run? That is, you have not re-started SpinRite from the beginning for a second pass? And the Graphic Status Display (GSD) screen shows no blocks with R's, U's, or B's?

If the drive has not been used for a long time, the bit patterns on the platter surfaces will weaken over time and become progressively harder to read. SpinRite can be very patient and persistent in trying to read the sectors, taking lots of time if necessary.

A clean GSD screen suggests that the blocks were read successfully (no U's), DynaStat likely not needed (no R's), and the data was rewritten successfully (no B's).

A second pass Level 2 run should then proceed at normal speed on a now refreshed., now good drive. If the second pass is still dog-slow, however, then the drive is bad.
Thanks for the reply! Yeah, the most recent screenshots I posted apply. Every run I have done has been Level 3. The first run on all the drives restarted SpinRite fresh for each drive using this command:

SPINRITE NORAMTEST LEVEL 3 DYNASTAT 0 NOREWRITE

This second run on just the 3 “iffy” drives is using this command:

SPINRITE NORAMTEST LEVEL 3 DYNASTAT 0 NOREWRITE XFER 4096

On this current drive in question, the one currently running (same one associated with the most recent screenshots I posted), the GSD screen is totally clean. Also, the drive has not been used for a long time (years). The run is not even half-way done yet, it will take several more days to complete. But I gather from your post above, that once it does complete, that the implication is that if the GSD screen completes clean, that if SpinRite is run again on the drive (using the first command above), that it should be much faster (hours for the entire drive to be processed) and should be good?


#39

D

DanR

Every run I have done has been Level 3. The first run on all the drives restarted SpinRite fresh for each drive using this command:

SPINRITE NORAMTEST LEVEL 3 DYNASTAT 0 NOREWRITE

This second run on just the 3 “iffy” drives is using this command:

SPINRITE NORAMTEST LEVEL 3 DYNASTAT 0 NOREWRITE XFER 4096

On this current drive in question, the one currently running (same one associated with the most recent screenshots I posted), the GSD screen is totally clean. Also, the drive has not been used for a long time (years). The run is not even half-way done yet, it will take several more days to complete. But I gather from your post above, that once it does complete, that the implication is that if the GSD screen completes clean, that if SpinRite is run again on the drive (using the first command above), that it should be much faster (hours for the entire drive to be processed) and should be good?
NOREWRITE is intended for the situation where data recovery is paramount. Only a 100% successfully read sector will be rewritten. Partially read sectors, with read errors, will NOT be rewritten, thus preserving the data . If data is of no concern here then NOREWRITE is pointless.

DYNASTAT 0 does no data recovery (DynaStat is disabled). Just one normal read is done. If the read is successful the sector is rewritten. Partially read sectors would be rewritten with zeros for the unreadable data, thus losing data.

Level 3 rewrites every sector.

I do not understand why either of the above command lines would be so slow, unless the drive is inherently slow. If Level 3 is not speeding up the drive then the drive is suspect.

I would suggest starting a normal scan at level 2 on this drive. No need for level 3 to to rewrite everything yet again. No need for DynaStat 0 since theh entire drive has been successfully rewritten. What speed does an unfettered level 2 run at?


#40

brado2049

brado2049

NOREWRITE is intended for the situation where data recovery is paramount. Only a 100% successfully read sector will be rewritten. Partially read sectors, with read errors, will NOT be rewritten, thus preserving the data . If data is of no concern here then NOREWRITE is pointless.

DYNASTAT 0 does no data recovery (DynaStat is disabled). Just one normal read is done. If the read is successful the sector is rewritten. Partially read sectors would be rewritten with zeros for the unreadable data, thus losing data.

Level 3 rewrites every sector.

I do not understand why either of the above command lines would be so slow, unless the drive is inherently slow. If Level 3 is not speeding up the drive then the drive is suspect.

I would suggest starting a normal scan at level 2 on this drive. No need for level 3 to to rewrite everything yet again. No need for DynaStat 0 since theh entire drive has been successfully rewritten. What speed does an unfettered level 2 run at?
@DanR — thanks for a great reply — good to know about NOREWRITE. Somewhere along the line I gathered that option was part of eliminating data recovery if not necessary. I also had gathered from some comments, and it appears I misunderstood, that the way to do a complete refresh of drives which required no data recovery (I have zero need for any data recovery on any drive) was to do a Level 3 with Dynastat 0. It appears I was mistaken.

Considering what I am trying to accomplish, looking at the level explanations in the SpinRite FAQ, and the comments above, is a Level 2 what I need, or should I do a Level 1? Given the FAQ’s description of Level 1 essentially being a Level 2 but without data recovery, that sounds like what I need. @DanR can you confirm? If that is indeed the case, then that is worthy of killing this current SpinRite run that’s been going on for days.

I look forward to your response!


#41

D

DanR

Level 1 is read only, one normal, pass, no data recovery

Level 2 is data recovery. If a sector reads OK with a normal read, move on. If a sector is hard to reads try to recover the data. SpinRite can do a lot of things here, including using DynaStat. Recovered data is is written back to the drive, refreshing it.

Level 3 is drive maintenance: Read and rewrite every sector. Level 3 will also do data recovery for hard to read sectors, just like Level 2.

DynaStat 0 does no data recovery and is a way to speed up SpinRite when data recovery is not a concern by eliminating all data recovery attempts and time.

The drive in question has had a complete L3 run and a second L3 run on the first part of the drive. There were no errors. The drive should be fully refreshed and operating at normal speed.

SO . . . Yes! A normal level 1 run should tell you very quickly if this is the case.

For drives like this with no data concerns, a normal level 3 DynaStat 0 run should be the fastest way to check them out.

XFER would be used to check problematic drives that may benefit from s gentler touch, not routine drive checking. I previously suggested XFER thinking the drive may not have been happy with the 32k mode.


#42

P

PHolder

Fascinating stuff, if you can share more, would really welcome it.
It's been a long time, and my memory is about as reliable as one of those HDDs that's been well used, but as I recall, the issue was the drive would hit certain problem sectors, and if the XFER size was large (the default initially, because Steve was trying to make SpinRite as fast as possible) the drive would basically just crash (as in firmware, not the heads impacting the surface.) This meant the drive became partially non-responsive, and the only recovery was a full power cycle. (A simple system reset did not recover the HDD function.) I tried all sorts of different XFER sizes, and in the end, pretty much anything at or over 10Kbytes caused this bad behaviour. Anything under that size and it never happened at all. I presume it's a firmware design defect in that specific drive... something to do with how it doesn't properly complete a large block request when it needs to attempt internal error recovery. Presumably its state machine gets out of whack, and it doesn't have any sort of a software watch dog to catch and reset it.


#43

peterblaise

peterblaise

@Steve Gibson left the XFER option for edge-cases, but changed
SpinRite 6.1 in development up to Release 4 to automatically change
the XFER value dependent on the situation, where if a read and
write of 32,768 blocks reveals bad sectors to be worked on, then the
XFER value will automatically drop down to whatever is
appropriate, even the equivalent of XFER 1 .

1754991549291.png


That being said, I have found useful flexibility using various XFER
values, especially on SSDs.

- - - - -

Note also that SpinRite 6.1 WRITES to a drive during enumeration,
so even LEVEL 1 , which is a read-only task, happens AFTER a
WRITE test.

To completely prevent writing, use the SKIPVERIFY command
line option, such as:

SPINRITE SKIPVERIFY LEVEL 1


#44

brado2049

brado2049

Again, I’m not going to go through every drive, as I’m almost done with this set of 13 drives, but since we’ve had a bunch of conversation on this particular drive the last several posts, it completed, and appears to make for a great example for the ultimate question on this thread. I’m guessing there might be different interpretations. The overall question is when you have a drive with no errors or minimal errors, but processes very slow, what’s the conclusion to be drawn? The final results on this drive, which was processed using the following command line:

SPINRITE NORAMTEST LEVEL 3 DYNASTAT 0
  • Graphic Status Display: runtime ~56 hours, one unrecoverable sector (or do GSD squares indicate multiple sectors?)
  • Real-Time Activities: 2,823 not recoverable, no other errors
  • SMART: ECC corrected 113/114, relocated sect 74/76
Given what I’ve gleaned so far (admittedly could be wrong), this seems like a usable drive. This seems like one of the many scenarios that SpinRite is built for — taking a problem drive and cleaning it up. Maybe that’s a wrong conclusion, but for you guys that have been processing drives with SpinRite for years, is this a drive you’d use or instead put through a wood-chipper?

Another thought occurred to me as well when looking at that one unrecoverable indicator on the GSD screen, and the RTA screen not recoverable count. Why are we getting stats on sectors that cannot be recovered when we are running a Level 3 with Dynastat 0, which I thought did no data recovery?

Thanks so much for your help everyone — this has been a real journey to understand all of this.

Attachments


  • IMG_0869.jpeg
    IMG_0869.jpeg
    99 KB · Views: 119
  • IMG_0870.jpeg
    IMG_0870.jpeg
    53.5 KB · Views: 100
  • IMG_0871.jpeg
    IMG_0871.jpeg
    73.3 KB · Views: 101
  • IMG_0872.jpeg
    IMG_0872.jpeg
    66.9 KB · Views: 102

#45

peterblaise

peterblaise

"... LEVEL 3 DYNASTAT 0 ... ~56 hours, one unrecoverable [ block ] ...
Real-Time Activities: 2,823 not recoverable, no other errors ..."​

When you have a chance, now run a LEVEL 5 DYNASTAT 0 , see if
it's 'healed'.

- - - - -

The blocks on the Graphic Status Display 72 blocks across x 14 blocks
down represent 1,008 equally proportioned parts of the total sector
count, for an example:

1755014332215.png


In my example, the total sector count is 3,907,029,167.

Divide that across 1,008 blocks means each block represents whatever
is happening in approximately 3,876,020 sectors!

Yeah, 3 billion sectors, each block represents 3 million sectors.

Wow. SpinRite has a lot to manage!

So any code displayed on a block could be from any one or more of 3
million sectors here.

Your drive has 5 billion sectors, so each block shows 5 million sectors -
approximately 5,814,021 to be approximate - of which, 2,823 were a
mess, about 0.049% of the displayed block is actually [ U ], which is
only 0.00000048% of the drive's total sectors.

Though one would think that 0.00000048% is trivial, it brings
everything to a halt when it's bad, doesn't it?

SpinRite is a wonderful thing.

;-)