Time "remaining" overly optimistic

  • Release Candidate 6
    Guest:
    We are at a “proposed final” true release candidate with nothing known remaining to be changed or fixed. For the full story, please see this page in the "Pre-Release Announcements & Feedback" forum.
    /Steve.
  • Be sure to checkout “Tips & Tricks”
    Dear Guest Visitor → Once you register and log-in:

    This forum does not automatically send notices of new content. So if, for example, you would like to be notified by mail when Steve posts an update to his blog (or of any other specific activity anywhere else), you need to tell the system what to “Watch” for you. Please checkout the “Tips & Tricks” page for details about that... and other tips!

    /Steve.

Dec 8, 2023
11
2
I've just (last week or so) started running Spinrite 6.1 prerelease 5, on several of my 4 TB to 8 TB spinning rust drives.

Presently, for example, I have less than 3 hours remaining on a Level 4 scan of an 8TB WDC WD80EZZX. All my drives have been clean, both according to their SMART data and Spinrite.

The only "glitch" I notice in this 6.1 pre-release is that Spinrite is overly 10 to 20 % optimistic in how long it will take to finish a scan, and only slowly, and rather predictably adjusts.

This present scan started out telling me to expect it to finish what is now 4 hours ago, and gradually extended that time, to what is now 3 hours still remaining. In other words, initial time to scan was about (I forget exactly) 42 hours, and it now looks to finish in a total of 49 hours.

It's understandable to me that the very first estimate would easily be off by 10 or 20%, but I would expect it to get a more accurate estimate quicker, as it got some more data on the drive in question (assuming that nothing difficult happened that required more work on Spinrite's part).

My (wild) speculation is that either:
1) estimates are not taking into account the lower data density (bytes transferred per second) on the inner tracks, or
2) the early estimates of how many blocks/second can be scanned, for that drive, at that level, are not refined as the scan proceeds.

I've long used a simple filter to blend such changing estimates smoothly. For example, once every so often (as is convenient for the
code in question) update the "New Estimate" to be 15/16-th of the "Old Estimate" plus 1/16 of the latest observation. I've been using
this method since I first read of it, in Richard Hamming's excellent "Digital Filters (1977)" book, long ago. With a little math, the
15/16 fraction can be adjusted to provide a more suitable half-life to the contribution of each sample in the running average, or one
can just wing it.

This is certainly not a show stopper for 6.1 ... so it's totally fine by me if this observation is tabled until Spinrite 7.0
 
1) estimates are not taking into account the lower data density (bytes transferred per second) on the inner tracks, or
You nailed it on the first try! You're exactly correct.

SpinRite (wrongly for spinners) assumes that the transfer rate will be uniform across the drive's surface. We know that's not true for spinning drives. So it is continually (re)calculating the time remaining based upon how long it took to get as far as it has.

I'm unsure what to do about that, since different drives "slow down" to varying degrees depending upon how far “in” they run. Newer drives tend to slow down more since their more advanced technology allows them to accommodate much smaller track circumferences, and since they've been pushed to squeeze out any possible storage space.

I really didn't give much (actually any) thought to that aspect of SpinRite beyond making sure that SpinRite's original estimation logic was working correctly after expanding it for today's much larger drives. I suppose that using a "rule of thumb" that the end of the drive would be half as fast as the beginning would be useful.

I've made a note in the RC5 dev notes. I'll give some consideration to improving SpinRite's estimation algorithm, which I would do be noting the rate at which its transfer rate is decreasing over time (if it is — it would not for SSDs). And then incorporating that into the estimation.

While I am VERY ANXIOUS to get this out the door... improving that estimation would be a HUGE convenience for anyone using SpinRite, especially at a higher level on a large drive. So, thank you for bringing this up! (y)
 
Please don't. We all know it is impossible to predict the future when unexpected events arise. Stay focused on the tangible.
But what's tangible (and incontrovertible) is the fact that SpinRite's "estimate" is always going to be wrong in the “too short” direction on any spinning drives. I never gave this the attention it deserved. My estimation algorithm expressly assumes uniform drive performance. Remember that SpinRite was born back when all drive tracks had the same number of sectors, thus there was no difference in outer and inner cylinder performance. (Remember the old "pre-comp cylinder setting? That was the cylinder at which the controller would begin using additional write pre-compensation. Write pre-compensation was the deliberate separation of adjacent flux reversals because at increasing densities they "read back" as being closer together.)

My point is that SpinRite's original "linear interpolation" was not introducing any errors. And for a time, when the difference was not 2:1 as it often is these days, the error was more tolerable. So, yeah, thinking about this a bit to see whether I can easily fix this seems worthwhile.
 
Thanks, Steve, Greg and Colby for the fine and sensible considerations.

Steve wrote:
<< My estimation algorithm expressly assumes uniform drive performance. >>

My mathematically inclined brain immediately wonders if that means that a better estimation would involve estimating the first or even second time derivative of the drive performance, once one has enough data points to estimate this. If, say, the 2nd derivative was nearly constant for a given test run, as I would guess it would be, assuming no flood of unexpected "bad" spots requires more intensive work, then one would have a solid basis for predicting the remaining time.

As to the trade offs of when to get serious about improving this, whether 6.1 or 7.0 ... I trust Steve will either make the right call, or the "too fussy" (as often is his style) call ... that will be as that will be ... and I have no doubt but that I will remain a happy Spinrite customer, either way.
 
If you have already benchmarked the drive, why not base the initial time estimate on the performance of the 50% mark. If drives are faster at the beginning, and slower at the end, that should give a reasonable average.
 
If you have already benchmarked the drive, why not base the initial time estimate on the performance of the 50% mark. If drives are faster at the beginning, and slower at the end, that should give a reasonable average.
I agree, Alan. But your initial phrase of “If you have already benchmarked the drive” disqualifies that approach for SpinRite's use since, more often than not, if would not be true. So I'll need to design a solution that always works. (y)
 
So I'll need to design a solution that always works. (y)

Perhaps not "always".

Perhaps rather "when people expect it to work", which in this case I would speculate:

* would be when the scan is proceeding uneventfully over a large swath of data, for a prolonged period of time,
* would be for "people" (Spinrite users) who tend to be the "resident geek" at the dinner table, and
* where "work" means close enough for government work, without a rather obvious and consistent error bias.

(and perhaps don't show HH:MM:SS or multiple decimal point fraction precision in estimates that can be off by hours <grin> ?)
 
Or just include a prefix "estimated time", with a caveat that times may be off by some margin, but the time taken to this point is always correct.

After all, if MS can say "under a minute remaining" for transfers that complete approximately an hour later on..............
 
I agree, Alan. But your initial phrase of “If you have already benchmarked the drive” disqualifies that approach for SpinRite's use since, more often than not, if would not be true. So I'll need to design a solution that always works. (y)
I would suggest, for 6.1, just add, say 10% to the current estimate. Worry about getting a better estimate in 7.0. It is a "nice to have", not a showstopper.
 
  • Like
Reactions: SeanBZA
I will note that fewer people are annoyed when something finishes (or arrives) earlier than expected than those who are annoyed in the reverse where something takes too long (compared to expectations.) The problem, as ever, is how to come up with an over-estimate which is not insanely out of whack with reality.
 
  • Like
Reactions: Steve
I will note that fewer people are annoyed when something finishes (or arrives) earlier than expected than those who are annoyed in the reverse where something takes too long (compared to expectations.) The problem, as ever, is how to come up with an over-estimate which is not insanely out of whack with reality.
I have no doubt that we'll have a lot of fun solving this particular problem once I'm able to return my attention to SpinRite itself. (Should not be much longer, now!)
 
Hi Steve - as the first idea towards solving this problem...does it make sense to filter by drive type?...in other words, SSD's probably don't need any adjustment to the current estimating algorithm, at least in terms of "organic" speed differences between the beginning and end of the drive...whereas, spinners need some compensation figured in... (obviously). To that point, I've attached a SR log from a recent run on my MacBook Pro (which I documented in another post here on the forums), which has the benchmark data from the (2) installed HD's --> a 250GB SSD and a 500GB HDD (spinner - the original OEM Apple HDD installed circa 2012).
You can see the SSD is actually *faster* at the "end" of the drive -- whereas the HDD drops off by more than 50%.

Also, I was trying to think through if some kind of "confidence" in the estimation made any sense or some kind of +/- tolerance...I know that updating the field sizes for the data on SR6.1 is probably not going to be possible...but just spitballing a little here possibly for SR7. ;-)

Regards,
Thomas

PS: The macbook hung up at the very end of the run - due to overheating - not the drive itself, but the CPU and/or logic board - and had to be hard reset - I believe that's what caused the log to be cut off - the final SPINRITE HAS CONCLUDED ALL OPERATIONS message never came up.
 

Attachments

  • 0-LOG.TXT
    11 KB · Views: 42
You can see the SSD is actually *faster* at the "end" of the drive -- whereas the HDD drops off by more than 50%.
Given the variability of drive condition, different disk bus speeds, and diverse ever changing kinds of drives, as well as what you note - that even "linear" SSD's can speed up (perhaps in less worn out portions of the drive), trying to condition the "remaining time" estimates based on these various factors would be a never ending task of adding special different cases to the logic.

On the other hand, Spinrite, for the actual test case in progress, can see how fast it's proceeding, and fine tune the estimate as it goes, whether the scan speed is going faster or slower or remaining steady, regardless of scan level, drive type or condition, or the phase of the moon.

I'll see if I can code up a C routine that does this. I don't write Steve Gibson's expert level assembly, but I write low level C quite well, and it's a suitable Lingua Franca these days.
 
On the other hand, Spinrite, for the actual test case in progress, can see how fast it's proceeding, and fine tune the estimate as it goes, whether the scan speed is going faster or slower or remaining steady, regardless of scan level, drive type or condition, or the phase of the moon.

I think SR already does what you describe... (See Steve's 1st response to you...2nd paragraph...)

"SpinRite (wrongly for spinners) assumes that the transfer rate will be uniform across the drive's surface. We know that's not true for spinning drives. So it is continually (re)calculating the time remaining based upon how long it took to get as far as it has.)"

I think the issue is the *initial* estimate, (as _you_ noted) and the fact that actual scan time can exceed the estimate by 10-20%...which (as someone noted) plays into the psychology of expectation - it takes "*longer* than expected", and, as such, could leave the user with a kind of "negative impression" as opposed to it finishing "sooner than expected",......or maybe even better...on-time! ;-p

Regards,
Thomas
 
We know that's not true for spinning drives. So it is continually (re)calculating the time remaining based upon how long it took to get as far as it has.)"
Does the current Spinrite adjust for the continuous, rather predictable, rate, at which spinning rust slows down its data transfer rate from inner cylinders, or does it presume, at each moment it checks, that (despite what's already happened in the scan), the remainder of the disk will transfer data at the aggregate rate so far?

If you're in a commercial passenger airplane that has started slowing down, from say 400 MPH to now 300 MPH, as it sets up to land, can you estimate the time you will enter final approach to landing by dividing [[ the remaining distance to the destination airport ]] by [[ (the average speed so far in the descent of 350 MPH) MINUS (final approach speed) ]], or should you take into account both that the plane has already slowed to 300 MPH, and that the plane continues to decelerate further, still losing (wild guess here, I'm no pilot) say 30 MPH per minute.

If done properly, the "remaining time" code should handle the cases where the speed remains constant, as on a "typical" SSD, or even gets faster, as on the SSD that you noted above, with NO special cases in the code. Changes, mid-scan, back and forth in the scan "operating level" between level 1 (examine only) to level 5 (exercise media) and any levels between, should also be smoothly handled, with NO special cases in the code to estimate the remaining time.

This is why I referred to the "second derivative" in one of my early replies above. The actual code to do this is a little bit of simple arithmetic, of the sort that was fast even back on the 1 or 2 MHz, 64 KWord (12 bits/word) PDP 8 that I first started doing low level coding on. The hard part is organizing the code and writing the comments, so that yourself, myself, or anyone else, can quickly understand what's going on.

(Granted, I didn't start coding these digital filters until working with the awesomely faster 8 MHz Intel 8080's, and after I had read Hamming's fine "Digital Filters" book.)
 
Last edited:
Ok - I coded up what I had in mind with the above comments.

The end result would result in Steve's assembly language equivalent to the following four lines of 'C' code:

C:
        smoothed_msec_per_mbyte += (msec_per_mbyte_this_loop - smoothed_msec_per_mbyte)/8;

        timing_delta_this_loop = smoothed_msec_per_mbyte - smoothed_timing_delta;
        smoothed_timing_delta += (timing_delta_this_loop - smoothed_timing_delta)/8;

        est_remaining_time = remaining_mbytes * (smoothed_msec_per_mbyte + smoothed_timing_delta/2);

By about 1/3 of the way through my test case, the estimated "remaining time" had converged
to within a few percent of the correct result, and remained so for the rest of the test case.

The test case presumed that the time per track decreased linearly, for the entire test.

The entire test harness that I used to test and document this code is in a 142 line 'C' file.

Attached is that 'C' file (with a *.txt extension, as requested by this forum software).

I hereby donate that file to Spinrite, for Steve to do with as he wishes.
 

Attachments

  • spinrite_remaining_time_estimate.txt
    5.7 KB · Views: 35
Last edited:
The above code made no effort to be "smart" about handling the initial estimates, as a scan begins or after a scan's level or coverage is changed.

So far as I know, the current Spinrite code does that better than anything I could suggest.

The above code focuses on refining that initial estimates of total time and time remaining, based on feedback obtained from the scan as it proceeds.
 
As I stare at the data from one of my test cases some, I see that my refined variant is better, but still not great.

1) A simple linear estimation of the total time to scan (without the "+ smoothed_timing_delta/2" term) starts off under estimating the total time by over 30%, climbing slowly to the correct value, under estimating less and less, until the scan is finished.

2) My refined variant starts off under estimating by less than 20% (better), getting right on the mark at about 1/3 of the way through the scan (great), but then continues to increase its estimate to about 5% over the mark (eh), around 2/3 of the way through the scan, then declining smoothly to the correct value when the scan is finished (and the total time is totally known).

So my refined variant is better, but still not great. Essentially, my intuition is telling me that a second order "ax2 + bx + c" curve would fit this data much closer than the first order "ax + b" curve that I'm essentially using above, allowing a quite close fit between the estimated and actual time remaining, almost the entire way through the scan, for all "normal" SSD and spinning rust cases.

However my intuition has not crystalized into code yet. I presume that whatever code I propose would ideally and easily translate into Steve style assembly language, without the use of any fancy floating point libraries ... just a few lines of good old x86 instructions (or 'C' or whatever he's doing Spinrite 7.0 mostly in).

In short, I continue to agree that Steve should leave this issue until sometime later, when he gets around to it, sometime after the public release of Spinrite 6.1.

If I get any second order code worthy insights, I'll post them here. Don't hold your breath however.