SMR drives being coy, and what happens when Spinrite falls for it?

  • Release Candidate 6
    Guest:
    We are at a “proposed final” true release candidate with nothing known remaining to be changed or fixed. For the full story, please see this page in the "Pre-Release Announcements & Feedback" forum.
    /Steve.
  • Be sure to checkout “Tips & Tricks”
    Dear Guest Visitor → Once you register and log-in:

    This forum does not automatically send notices of new content. So if, for example, you would like to be notified by mail when Steve posts an update to his blog (or of any other specific activity anywhere else), you need to tell the system what to “Watch” for you. Please checkout the “Tips & Tricks” page for details about that... and other tips!

    /Steve.

Peter P

Member
Nov 1, 2023
6
1
Canada
This is a deepish dive so feel free to give the convo a miss (^-^)
But there is one VERY important aspect SMR users need to know IMO.

I understood the basic principles behind SMR (Shingled Magnetic Recording) drives, but hadn't given much thought to the actual implementation. I recently watched this VERY informative presentation that IMO is well suited to audiences like us.


Despite recommendations to not run Level 3+ on SMR drives, many users might not even know they have them. Even the 34 page manual for my Seagate ST8000AS002 simply states "TGMR recording technology provides the drives with increased areal density" ...indeed

The inner workings of SMR drives may be beyond the scope of SR 6.1, but I think it's still helpful to know (or speculate) what's happening.
Unless otherwise informed, I'm assuming SR 6.1 is 'SMR unaware'.

SMR_ Drives explained and use cases00_06_59.267.png


For level 2, how does the recovery differ between "band data" and "cache data"? What happens after the recovered data is written to the persistent
cache? Does Spinrite verify what it just wrote from the cache or the band? What are the implications of bands having to be rewritten in their entirety when you change even 1 bit? How are bad sectors mapped in the cache? Are entire bands mapped out if there is a defect anywhere within?

**The really important part**
I have experimented with levels 3-5, and observed behaviour consistent with the persistent cache explained in the above video. In short, ALL data is written to the persistent cache first. Only then is it transferred to the appropriate band. The drive will wait for a few seconds of idle time before transferring data from cache to band. Or if the cache is full, the drive will interrupt further input until it can clear some space.

The bottom line is that while Spinrite's initial read is from a "band track", I think ALL subsequent write/read/verify is done ONLY on the persistent cache. The only way to verify the 'permanent data' is to somehow flush the cache before reading back what was just written. Or work in timed intervals to give the cache time to write everything out before verifying.

I did not record any figures or make calculations. But I did observe that after pausing Spinrite, I could feel the heads continue sweeping back&forth for several minutes. A clear indication the drive was busy transferring data from persistent cache to data bands. When the heads went still, I'd unpause spinrite and let it get on with its work. The time remaining would drop, then start climbing again when the cache filled up.

Given the persistent cache is 10's of gigabytes, I imagine even SR level 5 does all of its flipping, writing and verifying there entirely, completely unaware it's not touching the data's permanent location on the bands.

Level 3 can be useful for refreshing the bits on the media, but as I understand it levels 4 & 5 should absolutely be avoided. They work the drive so hard there's a real possibility of some data corruption. And level 5's final verification would give a false sense of security as it's more likely reading from the media cache instead of the band.

Whew, this went longer than I expected :) . Well anyway just happy to share my findings with anyone that might be interested. And if anything here can inform Spinrite 7 development so much the better.
 
Yes, I think this is correct. It has been discussed in development newsgroup. It's pretty much, much like running SpinRite on SSD drives in the sense that LBA to PBA mapping is dynamic and volatile. Reading LBA n and write back some inverted pattern to LBA n isn't at all useful in the sense that it tells you nothing about the condition of the sector at LBA n because there is no LBA sector n. LBA sector n is a virtual address that gets mapped to some physical location and the latter you don't know.

But like with SSDs, pumping high entropy data before putting original data back 'forces' a drive to utilize all it's available capacity and as such can tell something about the overall condition of a drive. Simply put you try to flood a black box and caches it maintains with data that it has to write somewhere. You don't know truly where, you sort of hope it at some point needs to utilize all available space to shuffle data around. Because at some point SSDs and SMR drives can not simply utilize caches. With SSDs we can actually observe the effect of full r/w type scans by running pre and post benchmarks. It's somewhat like detecting black matter, you can measure it's effect but you can not directly observe it.

The problem you address goes pretty much for both SMR drives and SSDs since they decoupled LBA from PBA addresses and use some form of 'translator'.

BTW SpinRite has some tricks (6.1 at least) to detect if it's dealing with a SMR drive. I think one is the drive supporting TRIM while rotational speed being > 0 and I think there was another one that I forgot and would have to look up again.
 
Last edited:
  • Like
Reactions: Peter P
Unless otherwise informed, I'm assuming SR 6.1 is 'SMR unaware'.
Thanks for your posting. You should consider yourself “otherwise informed” since SpinRite v6.1 IS SMR Aware! <g> SpinRite uses every available indicator (several are available) to detect SMR drives and to caution its user against running any "wholesale rewriting" level on such drives. It =IS= possible for a drive to be SMR and to be deliberately hiding that fact. (Some manufacturers have gotten themselves in hot water by doing this in the past.) So SpinRite does everything IT can to catch and inform users. And it also makes this very clear throughout its documentation. (y)
 
  • Like
Reactions: Peter P and SeanBZA
Yes, I think this is correct. It has been discussed in development newsgroup. It's pretty much, much like running SpinRite on SSD drives in the sense that LBA to PBA mapping is dynamic and volatile. Reading LBA n and write back some inverted pattern to LBA n isn't at all useful in the sense that it tells you nothing about the condition of the sector at LBA n because there is no LBA sector n. LBA sector n is a virtual address that gets mapped to some physical location and the latter you don't know.

The first place I looked was in spinrite.dev, but couldn't find any significant discussion. It's quite possible I missed it. I can't seem to retrieve old headers in gravity, and the search function on the forum web page doesn't work for me. But it's good to know this has all been discussed!

The problem you address goes pretty much for both SMR drives and SSDs since they decoupled LBA from PBA addresses and use some form of 'translator'.
Is that in fact how it works for SMR drives? I hadn't been able to find that out definitively.
Granted I have limited knowledge, I'm not sure how far comparisons of SMR with SSD can be made. SMR drives have no need for wear levelling, and PBA fragmentation will impose a performance hit unlike SSD. Just an educated guess but it seemed to me the LBA vs PBA presented to the OS was reversed. IE the OS is aware of the physical location, but entirely blind to the persistent cache and its mapping.

BTW SpinRite has some tricks (6.1 at least) to detect if it's dealing with a SMR drive. I think one is the drive supporting TRIM while rotational speed being > 0 and I think there was another one that I forgot and would have to look up again.
I only have that one model SMR drive to test with. Spinrite RC2 did not take any particular notice that I could see, hence my long and slightly misguided posting 😅
 
Is that in fact how it works for SMR drives? I hadn't been able to find that out definitively.
Yes.
I'm not sure how far comparisons of SMR with SSD can be made. SMR drives have no need for wear levelling,
This is true. I said their need to for LBA > PBA translation layer is similar. Both for different reasons can not without penalty write to a specific LBA address unless it's prepared to be written to. Solution: let the PC/OS think they're writing to the same LBA.
and PBA fragmentation will impose a performance hit unlike SSD.
Yes, and? Performance hit SSD may be barely noticeable but I'm sure it takes this hit too. Any kind of additional step you have to ge through costs time. I think it's safe to assume that both SSD and SMR drive firmware re-shuffle data (because they have to) and while doing so there's no good reason to not address PBA fragmentation while at it. From firmware's perspective, PBA fragmentation is an I/O overhead issue. This goes for both in use data as well as stale data: GC has to consolidate erase blocks, fragmented data means increased need to re-shuffle to accomplish that and at the same time it increases wear.
Just an educated guess but it seemed to me the LBA vs PBA presented to the OS was reversed. IE the OS is aware of the physical location, but entirely blind to the persistent cache and its mapping.
OS is unaware of actual physical location of data on SSDs and SMR drive we use in our PCs although this is design choice. Host aware addressing is certainly possible but not with the drives you and me both have installed in our PCs. In most SSDs and SMR drives, address mapping is done by the drive and drive simply presents itself as LBA block device. The drive is a black box.
I only have that one model SMR drive to test with. Spinrite RC2 did not take any particular notice that I could see, hence my long and slightly misguided posting 😅
And what model would that be?
 
Thanks for your posting. You should consider yourself “otherwise informed” since SpinRite v6.1 IS SMR Aware! <g> SpinRite uses every available indicator (several are available) to detect SMR drives and to caution its user against running any "wholesale rewriting" level on such drives. It =IS= possible for a drive to be SMR and to be deliberately hiding that fact. (Some manufacturers have gotten themselves in hot water by doing this in the past.) So SpinRite does everything IT can to catch and inform users. And it also makes this very clear throughout its documentation. (y)
Done and thank you! I should have made some inquiries before jumping to conclusions. ☺️
I'm having trouble accessing older spinrite.dev headers and couldn't find any relevant discussion. The model SMR drive I have seems to be one of those that slips past the test. RC2 did not give any indication it was aware this was an SMR drive.

I only read the option menu descriptions and hadn't looked at the new documentation. Indeed in FAQ section B, it says "...any level above 2 should be used sparingly on any solid-state or SMR drives". How often have I told users to read the manual? 😅
 
What drive do you have, Peter? I'd be glad to grab one to see whether SpinRite might be able to detect it. (y)
Sorry for originally stating the model number so obtusely 😅 . It's an 8 TB Seagate Archive model ST8000AS0002. I inherited a few survivors for helping out with a fire investigation. I've been using them, but now with SR 6.1 I can test them properly. I have attached the manual and a log file from testing for your convenience.
 

Attachments

  • ST8000AS0002 test.txt
    10.1 KB · Views: 75
  • Seagate Archive HDD 6 & 8 TB product manual.zip
    495.7 KB · Views: 71
Thanks Peter. I wasn't paying attention since Colby knew which drive you were using. I've just ordered one from Amazon for delivery tomorrow. I'll see whether there might be any way for SpinRite to detect that this drive is SMR. It's clearly spelled out that it's for "Archive Use" only.
 
  • Like
Reactions: Peter P
Thanks Peter. I wasn't paying attention since Colby knew which drive you were using. I've just ordered one from Amazon for delivery tomorrow. I'll see whether there might be any way for SpinRite to detect that this drive is SMR. It's clearly spelled out that it's for "Archive Use" only.
Despite the name "Archive", I've been beating them up with JBOD hot storage. Other than needing to occasionally catch its breath, it's proving to be quite reliable. For its intended purpose, I think it'd be an excellent performer.

I managed to download all the spinrite.dev headers, and now see all the SMR conversations I missed out on. I'll definitely be looking at those and will contribute there if I have anything of value to add.
 
Despite the name "Archive", I've been beating them up with JBOD hot storage. Other than needing to occasionally catch its breath, it's proving to be quite reliable. For its intended purpose, I think it'd be an excellent performer.

I managed to download all the spinrite.dev headers, and now see all the SMR conversations I missed out on. I'll definitely be looking at those and will contribute there if I have anything of value to add.
I use SMR drives (bunch of Toshibas) all the time including for non-archive stuff. It won't hurt the drives and 'archive use' is basically their way of telling you "don't complain if writes are slow".
 
I've been perusing old newsgroup postings and found a reference to the ST8000AS0002. For convenience in case anyone wants to review it, here is the beginning of the relevant thread. I did not recognize anything in the documentation indicating this drive is host aware, but given my lack of expertise I might have missed it. I will make any further comments in that thread to provide some degree of continuity.

Message-ID: <tkpce6$168e$1@GRC>

Subject: Re: Detection of CMR vs SMR drives
From: Scott F <scott200g@notreally.gmail.com>
Date: Sun, 13 Nov 2022 00:06:30 -0000 (UTC)
Newsgroups: grc.spinrite.dev

Steve Gibson <news007_@_grc.com> wrote:
> Following up on what Scott F wrote...
>
>> I would suggest that SpinRite should issue a Report Zones
>> command; if SR gets a response back, indicating the drive is
>> either Host Managed or Host Aware SMR, SR should refuse to
>> operate on those drives.
>
> Yeah. I agree, Scott. SpinRite should make sure that it won't
> run on any of those. That's annoying.
>
I couldn’t find any of those HGST HM-SMR drives for sale, but the Seagate
ST8000AS0002 seems readily available on eBay for about $80. That drive,
based on the documentation, is Host Aware SMR, so it could work as either
Host Managed or Drive Managed, but it should respond to the Zone ATA
commands so you can test that logic.