Export thread

  • DNS Benchmark v2 Release 5 with Consultant License
    Guest:
    If you own any earlier release of our DNS Benchmark you may immediately download its release #5 replacement. Running an earlier release will detect the new release and help you upgrade.

    Although this release is cosmetic, appearance matters and affects ease of use. The biggest change, as seen in the image above, is that the DNS Benchmark now has a traditional Windows application menu to more fully expose its many features. This release is also "Consultant License Aware" and GRC will now issue a Consultant version when owners have previously purchased four "Personal Use" licenses. If you have previously purchased four DNSB licenses, or if you wish to upgrade your "Personal Use" license to Consultant, GRC's purchase process will direct you through that process.
    /Steve.
  • Be sure to checkout “Tips & Tricks”
    Dear Guest Visitor → Once you register and log-in please checkout the “Tips & Tricks” page for some very handy tips!

    /Steve.
  • BootAble – FreeDOS boot testing freeware

    To obtain direct, low-level access to a system's mass storage drives, SpinRite runs under a GRC-customized version of FreeDOS which has been modified to add compatibility with all file systems. In order to run SpinRite it must first be possible to boot FreeDOS.

    GRC's “BootAble” freeware allows anyone to easily create BIOS-bootable media in order to workout and confirm the details of getting a machine to boot FreeDOS through a BIOS. Once the means of doing that has been determined, the media created by SpinRite can be booted and run in the same way.

    The participants here, who have taken the time to share their knowledge and experience, their successes and some frustrations with booting their computers into FreeDOS, have created a valuable knowledgebase which will benefit everyone who follows.

    You may click on the image to the right to obtain your own copy of BootAble. Then use the knowledge and experience documented here to boot your computer(s) into FreeDOS. And please do not hesitate to ask questions – nowhere else can better answers be found.

    (You may permanently close this reminder with the 'X' in the upper right.)

Issues running SpinRite 6.1 Rel 2 on Zimaboard

#1

the_physio

the_physio

I’ve set up a Zimaboard with FreeDOS & Spinrite 6.1 Release 2 to start testing my HHDs. I followed this YouTube video from Tech323_YT which looked like it was exactly what I wanted:

SpineRite Level 5 was running well until it reported several defective (“B”) sectors with “This drive FAILS to report WRITE ERRORS!” warnings. Oh no! However, if I run SpinRite on a PC the HDD passes with flying colours & another HDD (which previously passed L5 on the PC) then starts reporting multiple defective sector warnings on the Zimaboard. Every HDD I test works fine on the PC & has multiple warnings on the Zimaboard.

I swapped the bootable USB thumb drives between the machines but with no change to the above symptoms. Tried different size HHDs from 150GB to 4TB but again no change.

Is there a rookie error I’m making here?

Thanks in advance.


#2

D

DarkwinX

Welcome @the_physio

If you re-run the drive on the Zimaboard after the PC does it run into the same trouble?

Are you running the FreeDOS that comes built into the bootable USB or are you using the freedos that you installed on the Zimaboard? They are different, the one that SpinRite preinstalls is customised.


#3

the_physio

the_physio

Re-running the drive on the Zimaboard after a clear run on the PC still produced the same errors - & vice versa - the problem stays with the Zimaboard.

The FreeDOS is that installed on the Zimaboard - I'll try with the bootable USB & see how that goes & then report back.


#4

the_physio

the_physio

Used the Zimaboard with a bootable USB from SpinRite 6.1 Rel 2 & it still threw up a huge amount of these critical errors - although when I restart the process they aren't always coming up in the same sectors (see attached)
SpinRite-02a.jpg

SpinRite-03a.jpg

SpinRite-03b.jpg


#5

D

DarkwinX

Interesting that everything is repeatable.

Could you run the Diags command while launching Spinrite from both devices.

Code:
SPINRITE.EXE /DIAGS

It should produce some dbg files indicating the process Spinrite uses to detect the drives and controllers. If you can supply them it might help get to the bottom of this one.


#6

P

PHolder

Seems likely to be a cabling error of some sort if it's that random. Or a power issue. Did you get extra power for the ZimaBoard... I don't believe the supplied power supply is enough for spinning media.


#7

the_physio

the_physio

As suggested, I’ve tested (SpineRite 6.1 R 2 on Level 5) on a 1TB WD Green HDD using the Zimaboard 216 with DIAGS (refer to 3.DBG & 3.LOG) – plenty of early issues after only 0.3% completed.

Then repeated (same HDD & bootable USB) on a PC with DIAGS (refer to 4.DBG & 4,LOG) – clear of issues after 11% completed. I could test both to completion, but the difference is already very stark.

For the Zimaboard 216 I’m using the supplied 12V/3A Power Adaptor with only the one HDD attached. I do have the SATA Y-Cable but am using the single cable – but regardless the errors occur no matter which SATA cable I use.

If I rerun SpinRite on the Zimaboard a second time it doesn’t produce errors in the same sectors.

Attachments


  • 3.DBG.txt
    5.4 KB · Views: 418
  • 3.LOG.txt
    13.7 KB · Views: 901
  • 4.DBG.txt
    2.3 KB · Views: 398
  • 4.LOG.txt
    8.1 KB · Views: 407

#8

S

SeanBZA

You probably need an upgraded power supply, 3A is likely marginal for the setup, as the drive will be drawing a lot of power from the 12V rail to do all the head movements for the test, and you probably want at least a 5A supply to provide the needed power that is stable. 3A is fine for the base board with only a SD card or SSD as drive, but spinning drives do need more current to handle the peaks of starting up the motors and to do head positioning correctly. Upgrade to a 5A or higher 12V supply, or add in a second small supply that provides 12V and 5V to power the drive only.


#9

the_physio

the_physio

Tested (same HDD & same USB bootable SpinRite 6.1 R2) on the Zimaboard 216 but with the HDD powered from the PC (700W power unit with no other HDDs attached) – the same PC which runs flawlessly.

Sadly, there was no difference (refer to 6.DBG & 6.LOG attached). I really expected this to work.

Attachments


  • 6.DBG.txt
    5.4 KB · Views: 389
  • 6.LOG.txt
    39.8 KB · Views: 407

#10

D

DarkwinX

@the_physio I've created a bug report on GitLab referencing this.

If Steve needs to dig into it further would you be willing to try a few more tests? (Assuming nobody else can find something we've missed)


#11

ColbyBouma

ColbyBouma

This is a long shot, but have you run a memory test on the Zimaboard?

https://www.memtest86.com/download.htm


#12

the_physio

the_physio

Happy to try a few more tests if able. No, haven't done a memory test on the Zimaboard.


#13

Steve

Steve

I've read through the entire thread and looked at all of the logs and I agree with everyone that this is a mystery. SpinRite was developed primarily on the ZimaBoard (216) and it's been more heavily tested there than anywhere else. And we're also no stranger to the Western Digital WD10EARS family of drives. It appears that something about THAT ZimaBoard (we don't know about any others) and THAT drive are unhappy with each other.

Tried different size HHDs from 150GB to 4TB but again no change.
And that answers the question about whether the trouble is tied to the drive. It's not.

The logs show that the apparent errors being detected by SpinRite at Level 5 are random. From one run to the next there is no repetition.

What happens at Level 3?

Also, does the ZimaBoard's other SATA connector behave similarly?

And, finally... for diagnostic curiosity, if you had a spare PCIe SATA adapter lying around, you might try plugging it into the ZimaBoard's PCIe connector to see whether that drive attached through a different controller/adapter works as it does on your other PC.

NOTE TO EVERYONE: SpinRite produces the complaint “The drive did not report that it failed to properly write data to this sector.” when a read verification following an error-free write, fails data verification. That error report has not been sitting well with me. It's generated when a multi-sector block read verification—following an inverted or re-inverted write—with the read command succeeding but there being a data miscompare. Since the block read succeeded without ANY complaint, not even an ECC error correction needed, but the data does not compare, SpinRite faults the previous writing action which may have mis-written the testing data without complaint (though write operations do not generally complain since they do not inherently verify).

At this point, I agree that the ZimaBoard itself is suspect.

Once upon a time, the earliest SpinRites used to perform some quick RAM testing. That fell by the wayside through the years since RAM is so rarely troublesome... but this makes me wonder whether perhaps SpinRite itself ought to be verifying the integrity of the multiple very large 16MB buffers that it's using.


#14

the_physio

the_physio

Thanks Steve – I feel like I’m addressing royalty – I’ll test Level 3 & have a look about for a PCIe SATA adapter to try your other suggestions & report back.

I am slowly drawing the solution that the Zimaboard itself is suspect. One interesting issue was that when I first fired it up & attempted to update the CasaOS it steadfastly refused to do so. I wasn’t overly concerned as I’d always bought it as a SpinRite machine so went through the process of installing FreeDOS as previously mentioned.

I’ll let you know how the Level 3 & possible PCIe SATA goes.


#15

Steve

Steve

I am slowly drawing the solution that the Zimaboard itself is suspect.
Right. These ARE complex systems and it's certainly very possible for some subtle manufacturing error to escape their post-manufacturing testing. If anything, I'm surprised that it doesn't happen more often! My concern here is that the trouble appears to be affecting SpinRite's operation, and in a way that SpinRite MAY be able to detect and protect from. And if so, I would want to do so.

If you can run https://www.memtest86.com/ on that board and find a problem there, THAT would be extremely useful! (y)


#16

the_physio

the_physio

Okay, an update on today before I hit the sack.

I did run Level 3 on a 500GB HDD which finished completely error free.

I also started a Level 4 test & aborted after 29% completed after 2 errors both “The drive did not report that it failed to properly write data to this sector…”.

Just for completeness I also started a Level 5 test & aborted after 8% completed after 9 errors – same as above but in different sectors to the Level 4 test.

I haven’t been able to source a PCIe SATA adapter, so no test in that regard.

I’ve kicked off a MemTest86 which is still running – so far 1 error reported. FYI, I've just run the first optioned test so let me know if there's something in particular I need to specify. While I probably know more about this stuff than my health care workmates I'm definitely not a geek/guru/nerd - so bear with me. ;)

I’ll update the full MemTest86 results tomorrow – probably after I get home from work (down-under time). Thanks for your input & assistance.


#17

Steve

Steve

I’ve kicked off a MemTest86 which is still running – so far 1 error reported.
HOLY CRAP!! THAT SHOULD NEVER HAPPEN!! I have NEVER seen MemTest86 generate a single error.
Wow. This changes my planned next project... SpinRite needs to verify, as best it's able, the safety of machine's RAM for holding data.
I've never explored MemTest86 deeply, but it appears able to generate an report of its findings. That would be good to have.

UPDATES:
  1. I have MemTest86 running on one of my ZimaBoards.
  2. The documentation indicates that when the test is completed you will be given the opportunity to save the results to an HTML file. PLEASE do save the results, and share them here. THANK YOU!
  3. (7:20am) The full default 1st pass on my ZimaBoard 216 required ~48 minutes. (No errors found.)
  4. (8:55am) In pass #3, no errors found.
  5. (9:36am) Stopped before the end of pass #3 (since I want to use the ZimaBoard). I saved the MemTest86 HTML report then shutdown the ZimaBoard. The HTML file is written into the: /EFI/BOOT/ directory. Mine was 22K and is a nicely formatted standalone HTML file. PLEASE do save one and share it! Thanks!
@the_physio: I imagine you would like to have a functioning ZimaBoard. But your apparently defective ZimaBoard is valuable to this project. So rather than having you return it to ZimaLand, I would be glad to purchase a replacement ZimaBoard and have it sent to you if you'll send your defective ZimaBoard to me. There's no hurry here and I expect that I may be able to arrange to have SpinRite run on your ZimaBoard by having it intelligently use a different region of RAM after testing a region and finding trouble there. I'll have a new release of SpinRite for you to test. I just wanted to put this out there so that this defective board was not returned. (y)


#18

ColbyBouma

ColbyBouma

I've had MemTest86 generate more than 1 error before :)

1710943472988.jpeg


#19

Steve

Steve

Wow Colby! What's the backstory there?


#20

ColbyBouma

ColbyBouma

It's my previous PC. I originally built it with 32 GB of RAM, and it worked great. After a few years I decided to upgrade to 64 GB by buying the same kit of RAM, 2 16 GB sticks. I don't remember how long it took, but I eventually started noticing strange behavior, and I think I even had a corrupt file. I think I ran it like that for a few months before I decided to run MemTest86. I was actually able to get the full 64 GB to pass MemTest86 by lowering the RAM speed from 3200 to 3000 in the BIOS.

Due to this experience, I will ALWAYS test every computer before I use it. My new computer passed with flying colors :)


#21

Tazz

Tazz

This is prompting me to check my two machines. It's been a couple of years since the last memtest86 run.


#22

A

AlanD

I have not seen a problem recently, but when a Windows machine fails multiple times with different error codes, Memtest is always my first thought. Thinking back, it was probably DDR2 and earlier RAM that was most susceptible to these problems.


#23

the_physio

the_physio

HI Team, I awoke to a large "FAIL" on the monitor - see attached screen shots. I've also attached the LOG file (renamed to a TXT file) but happy to upload the HTML if that's better. I'll review again when I get home from work & please feel free to suggest any checks or changes to the tests - it does recommend running one in Parallel Mode so I'll probably do that at least.

Happy to not return the Zimaboard if it's of use to the project - like you said no hurry.

I'll look forward to kicking off a Parallel Mode test tonight - unless for any reason it's recommended not to.

Attachments


  • MemTest-a.jpg
    MemTest-a.jpg
    208.7 KB · Views: 354
  • MemTest-b.jpg
    MemTest-b.jpg
    148.5 KB · Views: 382
  • MemTest-c.jpg
    MemTest-c.jpg
    73.9 KB · Views: 377
  • MemTest86-20240320-174349.log.txt
    94.1 KB · Views: 465

#24

Tazz

Tazz

I'll look forward to kicking off a Parallel Mode test tonight - unless for any reason it's recommended not to.
IMO, there's no need to. The RAM has problems.
You could do it just to see how bad it is.

In the ZimaBoard BIOS is there a way to change the RAM settings? Maybe lower the speed or bump the voltage up a .01v? If so you *may* squeeze a pass out of it as @ColbyBouma did.


#25

S

SeanBZA

Ran Memtest on my new to me computer for 2 days, got to around pass 13 with no errors. I actually had a free bad stick of DDR memory years ago, with a single stuck bit. Experimented passing a custom kernel parameter to exclude that 4k block of memory, and the system was perfectly fine with it. That memory was from a computer in a house that got struck by lightning, which fried the motherboard, power supply and DSL modem. Not the Intel Pentium CPU, those were hardy beasts, I tried killing one with an electric fence energiser, and after 50 jolts of 8J of energy, it was still perfectly good. Still a 200MHz part, but still fine. Finally killed it using a set of jumper cables and a car battery to pass around 100A through the substrate, burning off half the pins.


#26

Steve

Steve

HI Team, I awoke to a large "FAIL" on the monitor - see attached screen shots. I've also attached the LOG file (renamed to a TXT file) but happy to upload the HTML if that's better. I'll review again when I get home from work & please feel free to suggest any checks or changes to the tests - it does recommend running one in Parallel Mode so I'll probably do that at least.

Happy to not return the Zimaboard if it's of use to the project - like you said no hurry.

I'll look forward to kicking off a Parallel Mode test tonight - unless for any reason it's recommended not to.
Nice!! I wanted to be sure that you saw my earlier posting, here:

https://forums.grc.com/threads/issues-running-spinrite-6-1-rel-2-on-zimaboard.1536/post-11562

I'll be working on adding built-in RAM integrity testing to SpinRite once I get the image downloading sub-system finished. THANKS!


#27

Happenstrance

Happenstrance

Nice!! I wanted to be sure that you saw my earlier posting, here:

https://forums.grc.com/threads/issues-running-spinrite-6-1-rel-2-on-zimaboard.1536/post-11562

I'll be working on adding built-in RAM integrity testing to SpinRite once I get the image downloading sub-system finished. THANKS!
Perhaps given this happened with a Zimaboard which has memory soldered in, you could include a mention of the $64 ZimaBlade in your sitewide announcement of the ZimaBoard, one of which I just received, that you can plug memory into separately. The cheaper price so far for me doesn't diminish performance but I understand you'll need to test it for yourself beforehand. There are discount coupons for March: $10 off with code "starbird10" or 15% off with code "HOMER15" that brings the price of the ZimaBlade down to $55.

The main difference is only one USB port and one ethernet port, one less of each than the ZimaBoard, also the ZimaBlade is powered by a separate USB type C port which if you don't have at least a 3amp 12 volt supply for, will need to be purchased separately. Also if you don't already have spare RAM to plug in then buying RAM is an additional purchase.


#28

Tazz

Tazz

the ZimaBoard, one of which I just received
Hmmm, still waiting for mine. The expected shipping date jumped from Jan. 31 to Mar. 29 😩


#29

Steve

Steve

I'll just note that it's not really a $64 ZimaBlade since that purchases a RAM-less board. Add the $40 16GB of RAM and we're up to $104. Still less expensive than the $119 ZimaBoard and yes, certainly, plugable RAM (though I would argue that the reduced reliability of an additional connector might offset the benefit of exchangeable RAM.

I DO like the fact that it appears to have power and reset headers included. I've had to add those to my ZimaBoards. And having an outboard means for resetting and power-controlling my ZimaBoards have been a "must have". So I, too, have one on order to see for myself. (y)


#30

the_physio

the_physio

I ran (what I believe is the same test as previous) MemTest86 last night - it took about 5 hours - & this came up with 101 errors (up from 6). See attached. Is this unusual or unexpected?

Attachments


  • MemTest86-20240321-165831.log.txt
    103.3 KB · Views: 465

#31

Steve

Steve

Wow! That's quite something! I'm glad you ran this again.

I've just finished implementing the non-Windows (Linux) support for directly downloading bootable images. So my next project that I'll be working on now is adding RAM testing to SpinRite so it will hopefully be able to catch this sort of problem before it begins.

So... please stay tuned!


#32

Steve

Steve

@the_physio:

I've added an upfront RAM test to SpinRite and it would be EXTREMELY valuable to see whether it is able to detect that something is amiss with your ZimaBoard: https://www.grc.com/dev/SpinRite/SR61-RT0.EXE. This is an unlicensed directly downloadable release of SpinRite v6.1 which cannot actually do anything with drives since that code had been removed... but this makes is easier to obtain and test.

Note that it would be great to know not only whether or not the testing detects any problem with your ZimaBoard, but also, if it does, how quickly it does. (And if it does, could you try running it a few times to see how quickly, since that might vary.) Thanks very much!


#33

the_physio

the_physio

I downloaded the above RAM test version & then on the bootable USB, renamed SPINRITE.EXE to SR61R2.EXE, copied the SR61-RT0.EXE onto it & renamed it SPINRITE.EXE. The AUTOEXEC.BAT has SPINRITE.EXE as the last line so kicks it off immediately.

I’ve run the RAM Test SpinRite 3 times without any apparent errors.

Test #1 was for about 3 hours with about 201,000 patterns tested.

Test #2 was for about 1 hour & I did use the command “SPINRITE.EXE /DIAGS” with about 92,000 patterns tested without any apparent errors. But I couldn’t see any log produced by the RAM Test SpinRite.

Interestingly when I ended Test #2 I noticed that SpinRite stated there was a BIOS it couldn’t access – I’m guessing this was due to the BIOS Boot Option Filter being set to “UEFI & Legacy” as this matter disappeared when I set it to “Legacy Only” & ran the test a third time.

Test #3 is now still running.

I've got some screen shots of some of the above if they help.

Any suggestions or questions?


#34

Steve

Steve

I'm disappointed but I'm not surprised. One of the things I noted from your tests with MemTest86 was that the region where those two bits were being found uncertain were higher in memory than the region SpinRite uses. But, at the same time, the nature of those bit uncertainties, always being the same two bits widely scattered, doesn't feel like specific problems with RAM locations, but rather a problem with the "bit lines" of the memory. And of those 101 errors in your second test run, only one appeared down in the first 50 megabyte region that SpinRite uses.

It's also interesting that one time you ran MemTest86 you got 1 error on the 1st pass, 5 errors on the 2nd pass, and then no errors on the 3rd or 4th passes. Then, when you ran it again, you got a total of 101 errors.

But of course, what we care about is that SpinRite's operation appears to be affected by something about your ZimaBoard, and when you ran MemTest86 on it, it produced errors.

I'm going to spend some more time with the RAM testing to see whether I might be able to do something more. I'll be back in touch! (y)


#35

the_physio

the_physio

Okay, I've left the RAM Test SpinRite running overnight - still no apparent errors after 913,592 patterns testing. Let me know if you want: (a) this test to stay running; (b) keep starting additional tests if there are no errors after an hour or so; (c) run MemTest86 again; (d) try SpinRite 6.1 Rel 2 again; or (e) turn Zimaboard off & await further instructions. :)

Attachments


  • RAMTest-a.jpg
    RAMTest-a.jpg
    381.1 KB · Views: 379

#36

Steve

Steve

@the_physio:

Thanks for your continued testing! I have another test release for you to try: https://www.grc.com/dev/SpinRite/SR61-RT1.EXE. Note that it's "RT1" instead of "RT0". (You setup everything correctly last time... Though you could also just place the SR61-RT1.EXE on the boot drive then exit SpinRite to the DOS prompt after it starts, then manually start SR61-RT1 by entering its name on the command line. My point being... there's no need to rename things, if that might be easier for you, since you can run whatever you choose from the DOS prompt after exiting SpinRite.) (y)

This RT1 test takes a different approach. The first one was filling all of the various SpinRite buffers with the same repeating 32-bit pseudo-random pattern of data. But if the trouble is with those two data bit lines not moving up or down quickly enough, it might be that successive reads and write need to have different data. So this -RT1 test starts with a random 32-bits, but rotates it every time. This has the effect of causing different data to be written and read from successive transfers.

Anyway... Let's see how this one does? If it doesn't produce an error in an hour then it doesn't appear to be any more able to generate trouble for your ZimaBoard than the first one.

And at that point, yes, it might be useful to have you re-run MemTest86 just to verify that it's still able to make your ZimaBoard misbehave.

Thanks VERY much!!


#37

the_physio

the_physio

Ran SR61-RT1.EXE & all looked good for the first 15 minutes so I set an alarm & came back after it had been running for an hour or so to find this warning (see attachment RAMTest-RT1-a.jpg).

Used ESC to return to DOS & ran it a second time. Interestingly after an hour or so it was running error free (see attachment RAMTest-RT1-b.jpg) – actually it still is after nearly 1.5 hours. I’ll see how long it takes for an error on this second run. It can run just off to the side of my TV while I watch the Australian F1 GP. ;)

Attachments


  • RAMTest-RT1-a.jpg
    RAMTest-RT1-a.jpg
    391.2 KB · Views: 403
  • RAMTest-RT1-b.jpg
    RAMTest-RT1-b.jpg
    404.5 KB · Views: 380

#38

Steve

Steve

It's great that the second version found the trouble... and it seems like it's about as sensitive as MemTest86, since it also never found any problems during the 3rd and 4th passes the first time you ran it.

But... it doesn't look like we'll be able to count on a =brief= RAM test, which is run at the start of SpinRite, to detect this sort of very marginal RAM.

THANK YOU for the testing. I'll be interested in any further experiences you have with this. And I may have more thoughts later!


#39

the_physio

the_physio

Have continued testing the SP61-RT1.EXE with the following (confusing?) results:
2 hours 30 minutes – 2nd error occurred
0 hours 1 minute – 3rd error occurred
0 hours 0 minutes 10 seconds for 4th error
0 hours 0 minutes 14 seconds for 5th error
Immediate production of 6th error
0 hours 0 minutes 10 seconds for 7th error
0 hours 4 minutes for 8th error
0 hours 3 minutes for 9th error

Have taken screen shots of the errors if these are required.

I think I'll run the MemRest86 again overnight.


#40

S

SeanBZA

I would say that your particular board has a memory chip that is on the edge of passing, which passed the factory test and speed grading, but which was actually marginal, likely with it having, due to ineviatable process variations, a delay in responding that is on the edge of being faulty, so the random fails in memory bits. Probably best to return the board, or try to underclock it, setting the RAM settings to a slower speed, so the memory has more time to respond to the CPU. Otherwise you could try adding on extra cooling, with a fan keeping the memory temperature down, or with a cooler for the entire board.


#41

Steve

Steve

Have continued testing the SP61-RT1.EXE with the following (confusing?) results:
2 hours 30 minutes – 2nd error occurred
0 hours 1 minute – 3rd error occurred
0 hours 0 minutes 10 seconds for 4th error
0 hours 0 minutes 14 seconds for 5th error
Immediate production of 6th error
0 hours 0 minutes 10 seconds for 7th error
0 hours 4 minutes for 8th error
0 hours 3 minutes for 9th error
Thanks VERY MUCH for sharing these encouraging new results. It appears that the second edition of SpinRite's new built-in RAM test is successfully detecting that ZimaBoard's problems -- and more successfully than your first try suggested! I'm working on a 3rd edition which will be optimized for speed, and this will wind up being permanently incorporated into SpinRite. I'll let you know once I have the performance optimized release ready. Thanks!


#42

Steve

Steve

@the_physio:

After seeing that the 2nd try was successfully detecting RAM trouble on your ZimaBoard, I spent some time working to make it faster. The idea is that testing faster means that marginal reads and writes would be more likely to be found more quickly. So, this -RT2 release is now running about 6.5 times faster than -RT1: https://www.grc.com/dev/SpinRite/SR61-RT2.EXE while doing as good a job with testing for any trouble.

I would love to have you see, as you did previously, whether this one appears to be better at catching your ZimaBoard's misbehavior... as in faster to find trouble. Thanks!!


#43

the_physio

the_physio

Hmmm...4.5 hours running SR61-RT2 without an error (1,346,000 patterns tested). I'm going to kick start it for a couple of 15 minute periods & then try running it with a HDD connected to the Zimaboard just in case this somehow affects it - possibly leave this last configuration running over night. I'll post an update in the morning.


#44

Steve

Steve

@the_physio:

If you have both SR61-RT1 and -RT2 on the same boot drive, it would be very interesting to have you try -RT1 — that was finding trouble — after -RT2 fails to do so to see whether -RT1 finds what -RT2 does not, within the same time environment.

I'm doing different things with -RT2 in the interest of performance -- things like cache-line alignment -- so it might be that what -RT1 was doing wound up being better than -RT2, despite -RT2's much improved rate of testing.


#45

the_physio

the_physio

An update of testing over the last 24 hours:
  1. Ran MemTest86 again overnight & it produced errors (see attached).
  2. Turned off the Zimaboard while at work.
  3. Came home & downloaded SR61-RT2.
  4. SR61-RT2 run for 4:30 (4 hours & 30 minutes) – no errors.
  5. SR61-RT2 run for 0:15 – no errors.
  6. SR61-RT2 run for 0:15 – no errors.
  7. SR61-RT1 run for 0:15 – no errors.
  8. Performed SR61 benchmark on USB drive & then SR61-RT1 run for 0:15 – no errors.
  9. Warm reboot, performed SR61 benchmark on USB drive, & then SR61-RT1 run for 0:15 – no errors.
  10. Attached 500GB HDD to Zimaboard & then SR61-RT2 run for 0:15 – no errors.
  11. Started SR61 Level 5 on HDD & after 10 minutes 3 “B” defective indicators with the associated dire warnings.
  12. SR61-RT2 run for 8:15 (overnight) – no errors.
  13. SR61-RT1 run – after 1:30 REPORTED ERROR.
  14. SR61-RT1 run – after 1:00 REPORTED ERROR.
  15. Currently just initiated another SR61-RT2 test – we will see how it goes.

Bottom line:
  • The Zimaboard still fails the MemTest86 test.
  • SR61 (Rel 2) on Level 5 still produces inconsistent errors (ie the affected sectors are never the same from test to test) stating the HDD is unreliable (although it passed Level 5 on a PC).
  • SR61-RT1 still produced 2 errors but it took between 1:00 to 1:30 of run time for each.
  • SR61-RT2 to date has produced no errors after 13:30 of testing over 5 different tests & 1 test currently in progress.
  • Trialling some factors to see if they would speed up an error report have proved inconclusive (such as allowing the Zimaboard time to “cool down”; allowing the Zimaboard time to “warm up”; performing a benchmark on a drive first; connected a HDD to the Zimaboard; performing a partial Level 5 scan on the HDD first).
Food for thought.

Attachments


  • MemTest86-20240324-185737.log.txt
    93.3 KB · Views: 419

#46

Steve

Steve

FANTASTIC work, @the_physio. Someone in the NNTP newsgroups also has a machine where -RT1 will detect an error but -RT2 does not. So on that basis I'm going to scrap that route of "optimization" and return to the code that runs more slowly but DOES find problems. Then I'm going to work to speed it up a bit more. So I'll have an -RT3 to test, after this week's podcast, so probably Wednesday. Thanks, again!


#47

D

DarkwinX

so probably Wednesday.

I'm fairly sure this translates to Aus Thursday :)


#48

the_physio

the_physio

Sounds like a good strategy Steve as I haven’t been able to generate an error from -RT2. The latest run went for 5:30 (5 hours & 30 minutes).

I did do further -RT1 testing & although not fast it has been reliably generating an errors. The latest batch of 5 tests all generated an error within 0:35 to 2:07.

Look forward to -RT3 & the latest podcast – which if I’m not mistaken hits the east coast of Australia late Wednesday afternoons. :)


#49

the_physio

the_physio

Just an update that the -RT2 did finally produce an error – I left it running when I went to work & while I can’t say how long it took it was definitely within 9:12. Better late than never. ;)


#50

Steve

Steve

Just an update that the -RT2 did finally produce an error – I left it running when I went to work & while I can’t say how long it took it was definitely within 9:12. Better late than never. ;)
Thanks for the report. At least we know it's not completely non-functional! <g> I'll be producing -RT3 shortly. I spent some time on the code yesterday.


#51

Steve

Steve


This SR61-RT3 RAM test release restores the testing methodology which was successfully used by RT2 and it adds the final screen embellishments of a total running time (upper left) and total errors found (upper right). I'll be merging this into SpinRite for its next release.

@the_physio: Naturally, I'd love to see (confirm) whether this proposed final RAM testing system performs as well as RT2 did. (It certainly should.)


#52

T

TProbst67

Hey Steve -- Illegal Opcode running -RT3 after a couple of hours.

I mentioned it in the newsgroups, and providing screenshot here.

Regards,
Thomas

Attachments


  • IMG_0588.JPEG
    IMG_0588.JPEG
    223.3 KB · Views: 370

#53

the_physio

the_physio

Steve, did you mean RT1 instead of RT2 in your references above?

SR61-RT3.EXE downloaded & results so far:
  • Within 0:02 (0 hours 2 minutes) it had detected an error.
  • As I was about to go out I didn’t photograph the screen (sorry), just kicked off another test which likewise appears to have reported an error before I could leave the room.
  • “Looks like there will be plenty more of these in short order” I thought as I did went to do my shopping. On my return (about 2:00 later) the screen was still predominantly red but showing what I’d swear was different information to what was on it when I left – refer to attachment SR61-RT3a.jpg of an “Attempt to Execute Illegal Opcode!” warning.
  • I was unable to remove this warning (using ESC, Enter, Ctrl-Alt-Del) & had to do a cold reboot.
  • Started another test which ran without error until approximately the 1:00 (1 hour) mark when it again reported an “Attempt to Execute Illegal Opcode!” warning – with some slightly different data – refer to attachment SR61-RT3b.jpg.
  • Again, I needed to do a cold reboot to remove this warning.
  • Initiated another test & it ran without error – but at EXACTLY 1:00 (1 hour or 1:00:00 on the displayed run time) it generated a third “Attempt to Execute Illegal Opcode!” warning – refer to attachment SR61-RT3c.jpg.

I’ll keep running -RT3 until I hear otherwise.


#54

the_physio

the_physio

I think the files I attempted to attach to my previous post were too large, so these smaller images should still do.

Attachments


  • SR61-RT3a.jpg
    SR61-RT3a.jpg
    426.8 KB · Views: 446
  • SR61-RT3b.jpg
    SR61-RT3b.jpg
    452.7 KB · Views: 441
  • SR61-RT3c.jpg
    SR61-RT3c.jpg
    454.7 KB · Views: 471

#55

Steve

Steve

https://www.grc.com/dev/SpinRite/SR61-RT4.EXE

EDIT!! RT4 is also NOT CORRECT. IT SHOULD BE OKAY FOR UP TO 60 MINUTES. I"LL HAVE RT5 SHORTLY!

@the_physio and @TProbst67 : I believe that I found and fixed the cause of the illegal opcode. I'm not certain of that, yet... But I definitely found and fixed something that COULD have been blasting the testing code when an error was found. This SR61-RT4 should be used in place of any earlier versions. Let's see whether IT ever generates the illegal opcode.

Thank you!


#56

ColbyBouma

ColbyBouma

Multiple people have reported that it crashes at exactly 1 hour. Doesn't that point to an overflow?


#57

Steve

Steve

Multiple people have reported that it crashes at exactly 1 hour. Doesn't that point to an overflow?
Ah. I saw one report of that, Colby. Thanks. I'll recreate that and see what's up. Thanks!


#58

Steve

Steve

RT4 is also NOT CORRECT. IT SHOULD BE OKAY FOR UP TO 60 MINUTES. I'LL HAVE SR61-RT5 SHORTLY!


#59

Steve

Steve

If at first you don't succeed...

I apologize for wasting everyone's time with the rushed out RT3 and RT4 releases. They were insufficiently tested. I've done a better job with RT5. It's the only one worth using, and it should run for (many) days without trouble. Thanks!!


#60

the_physio

the_physio

Update for SP61-RT5.EXE:
  • First test produced an error within 43 minutes (refer SR61-RT5a).
  • I needed to go out for an hour so I left it running just to see what happened – came home to see nothing had changed other than the timer on the left advancing.
  • Second test has been running for over an hour without any errors. I did watch for the 1 hour mark but as with the first test there was no “Attempt to Execute Illegal Opcode!” warnings. This test is still running without error with 1 hour & 5 minutes on the timer.

I’ll update when something changes.

Attachments


  • SR61-RT5a.jpg
    SR61-RT5a.jpg
    225.3 KB · Views: 390

#61

the_physio

the_physio

Hold the phone - second error kicks in just after the above post (refer SR61-RT5b).

Third test started.

Attachments


  • SR61-RT5b.jpg
    SR61-RT5b.jpg
    215.7 KB · Views: 410

#62

S

SeanBZA

I think Steve is writing a good utility to find flaky memory, which Memtest does not find.........

Might find a place in the free software library as a memory test utility, and could even be integrated into the ISO Steve is offering as well, as part of the Freedos suite, and included in the menu.


#63

Steve

Steve

@SeanBZA: Thanks, Sean, but MemTest =does= also find these errors. It was MemTest86's discovery of these problems that led me to invest in building a similar testing facility into SpinRite.


#64

Steve

Steve

@the_physio: Just a note that with SR61-RT5 it's no longer necessary to stop and restart it since RT5 now has an error counter (like MemTest86) which will allow you to just let it keep going. (y)


#65

the_physio

the_physio

New functionality of -RT5 now noted.

The third test results was left running overnight & produced 2 errors (refer to SR61-RT5c.jpg) after running for a period of between 10:30:00 & 12:58:02.

I have restarted a fourth test (sorry I didn’t read the above post before doing so) & as a new testing strategy I was intending to restart it every hour. @Steve do you think there is value in that or would you rather I just leave it running uninterrupted?

Attachments


  • SR61-RT5c.jpg
    SR61-RT5c.jpg
    221.7 KB · Views: 416

#66

Steve

Steve

@Steve do you think there is value in that or would you rather I just leave it running uninterrupted?
Its definitely "odd" that one time MemTest86 showed 6 errors and the next time 101. But "in theory" there should be no reason to stop and restart. Nothing should be different.


#67

the_physio

the_physio

Okay, I'll leave the Zimaboard running -RT5 & do an hourly check to see how many errors have acculated - after 4 hours there are currently 2.

Is there any value in photographing the errors (eg showing which "Erroneous bits found")?


#68

D

DarkwinX

@the_physio conscious that you may be running out of a return window depending on how recently you got the Zimaboard. You may have to jump through a few more hoops. Might be worth at least emailing their support and kicking off an RMA case which might buy you some time.


#69

Steve

Steve

Agreed. We've learned what we need to from that board. THANK YOU for all the terrific testing. But if you can get it replaced, there's no reason to wait any longer!


#70

Steve

Steve

@the_physio: If you have the time and inclination, I've created another test to see whether this one might be more adept at inducing and detecting RAM errors on your ZimaBoard. Since we don't have any theoretical model for what's going on, we're reduced to trial and error. But before you lose access to your ZimaBoard, if you are planning to return and replace it, I'd love to know whether this -RT6 might be a superior problem detector. Thanks!


#71

the_physio

the_physio

Thanks for the advice about returning the Zimaboard, but I did purchase it back in 2022 as I wanted to have it so I could hit the ground running when 6.1 was released. So I think the ship SS Warranty has well & truly sailed – correction – it’s sailed, arrived, disembarked, been towed out to sea, & skuttled to become an artificial reef for the sea life. ;)

Let me know if you’re still interested in doing a swap with it otherwise, I’ll try & think of some use that isn’t “mission critical”.

I’ve aborted the -RT5 test – after 9 hours it still hadn’t come up with more than the 2 errors it had after 3 hours.

-RT6 is now running & I’ll keep an eye on it every 15 minutes or so & let you know how it goes. :)


#72

S

SeanBZA

I have a feeling that as a first thing, you might want to use a different PSU on that board, before doing the sailing off via post. Use a new 12V 5A power supply, probably best to buy Meanwell, as they are cheap, quite good units, and 12V 5A is a very common one. you will need to sacrifice the existing cable to connect it, and also an IEC lead, plus look into a box to enclose it, as it is not exactly meant to be used open, but it does often get specced in industrial use. Then run the test again, and see if there is a difference, plus a 5A unit will work well for doing 2 spinners at once, and you do get dual output ones that provide a 5V supply and a 12V supply, so the drives can have their own 5V supply as well.


is an example, and there is US stock of it.

Otherwise


12V 6A, well capable of powering a few HDD units, though only a single 12V output, but most of the draw of a hard drive comes off the 12V rail.


#73

the_physio

the_physio

Okay -RT6 has been running for 3 hours now. It picked up 1 error in the first 15 minutes & then no more thereafter.

I feel like making an executive decision & starting another test to see if it picks up something early in the cycle - I know it shouldn't but there is almost a pattern of this happening. If it's zero after an hour or two then I'll turn it off & let it cool down before doing a rinse & repeat to see if it picks up something early from a cold start. Again, let me know if there is value in posting images of the error screen otherwise I'll just update with text.


#74

Steve

Steve

@the_physio: Text only is fine. Just the number of errors found. And anything you feel like doing along the lines of “executive decisions” will be terrific. We're all just feeling our way along here. If RT6 doesn't perform obviously better than RT5, and certainly if RT5 seems to do better, which it might, I'll stick with RT5 for SpinRite's final use. Thanks! (y)


#75

the_physio

the_physio

An interesting update on the SR61-RT6.EXE to share.

Executed a new “warm” run (ie the Zimaboard had been running for the better part of the day & was warm to touch) of -RT6 & produced 1 error around the 2:20 (2 hour 20 minutes) mark. I left it running until 3:00 without any new errors.

As promised, I turned off the Zimaboard for an hour & executed a “cold” run (ie the Zimaboard was now cold to touch) of -RT6. 4 errors were reported within 10 seconds. I left it running for until 3:00 (3 hours) without any new errors.

Repeated the turn off, cool down, & execute -RT6 cycle. 3 errors reported within 15 seconds. Left it running for another hour but no new errors.

Repeated the turn off, cool down, & execute -RT6 cycle for the third time. 3 errors reported within 2 minutes. Left it running for another hour but no new errors.

It’s a small sample set, but anecdotally for THIS Zimaboard at least the -RT6 appears to be much more sensitive when it’s cold. Indeed the “return on investment” of running -RT6 on THIS Zimaboard for longer periods once it’s “warm” is questionable.

Whether this is indicative of all memory issues I have no idea but I’d certainly recommend running -RT6 from a cold state. I’m wondering if the large variations in MemTest86 results were affected by the temperature of the Zimaboard – to be honest I can’t remember if the board was warm or cold when I ran those tests.

I'll repeat the cold test with -RT5 a couple of times to see how it goes & then I’m inclined to run the MemTest86 again from cold & warm to see how they compare too.


#76

Steve

Steve

I'll repeat the cold test with -RT5 a couple of times to see how it goes
Yes! Thank you! That's what I was hoping you could do. (y)


#77

the_physio

the_physio

Here are the results from running three cold tests of SP61-RT5.EXE:
  • Test#1: within 60 seconds – 6 errors; no further errors for next 30 minutes (when test was stopped).
  • Test#2: within 20 seconds – 1 error; within 3 minutes – 12 errors (in total); no further errors for next 30 minutes.
  • Test#3: within 5 seconds – 1 error; within 3 minutes – 2 errors; within 7 minutes – 3 errors; within 13 minutes – 4 errors; no further errors for next 30 minutes.
I’ll run MemTest86 from a cold start overnight & see how it goes.


#78

Steve

Steve

This is perfect, @the_physio! It appears that what we had with RT5 is catching plenty of problems, at least as well as RT6. And RT5 is much cleaner and simpler.

You’ve also clearly shown that on your machine this is a cold machine issue.


#79

C

CSPea

I have a feeling that as a first thing, you might want to use a different PSU on that board, (snip)
I'd agree that RE-testing with an alternative power-supply would be wise before counting our metaphorical RAM error chickens.

Even with the now evident temperature sensitivity that you've discovered in your excellent tests @the-physio, it's equally probable IMO that if the power-supply block (the 'power puck' etc.) is somehow below par, its voltage stability or current capacity could vary with changes in temperature ... given that it too would cool down and warm up in tandem with the ZimaBoard itself.

Colin P.


#80

Steve

Steve

Hey Colin (@CSPea):

I'm pretty certain that this was one of the first things that was suggested and that @the_physio did try using a beefier power supply. (y) But otherwise, definitely a good thing to rule out!


#81

the_physio

the_physio

Results of running MemTest86 again:
  • Cold Test – failed with 2 errors.
  • Warm Test – passed (ie no errors).
  • Warm Test using parallel CPUs – failed with 1 error.

@CSPea I'll see if I can source another 12v power supply (preferably 5A but even if different 3A unit is available it will be interesting) & will post an update of the outcome.


#82

D

DarkwinX

@the_physio Jaycar are usually a good source of beefier self contained power supplies.

I'd avoid the meanwell unit unless you're experienced with wiring up a unit or have a sparky mate.


#83

C

CSPea

Hey Colin (@CSPea):

I'm pretty certain that this was one of the first things that was suggested and that @the_physio did try using a beefier power supply. (y) But otherwise, definitely a good thing to rule out!
Ah!
Sorry I missed (or forgot!) that earlier reference!
Thank you Steve.


#84

the_physio

the_physio

Well in the immortal words of Bill & Ted..."Bogus". 12V 5A power supply & still hits MemTest86 errors. :(


#85

the_physio

the_physio

Hi guys,

Just thought I’d drop an update on my Zimaboard & SpinRite 6.1 saga.

I bought a new Zimaboard (232):
  • Ran MemTest86 in normal & parallel modes with no memory errors.
  • Ran the latest SprinRite v6.1 Release 3 for 24 hours with no memory errors reported.

Hooking up the old problematic Zimaboard (216):
  • As expect the MemTest86 in normal mode threw up errors as it had done previously.
  • Ran the latest SpinRite v6.1 Release 3 & it picked up memory errors within the hour.

FYI, I tried reloading the CasaOS onto the Zimaboard but even this was unsuccessful – not overly surprising I suppose as it wouldn’t update the OS when I first got it. Maybe I’ll start using it as a techy paper weight.

Cheers.


#86

HoboFist

HoboFist

It's my previous PC. I originally built it with 32 GB of RAM, and it worked great. After a few years I decided to upgrade to 64 GB by buying the same kit of RAM, 2 16 GB sticks. I don't remember how long it took, but I eventually started noticing strange behavior, and I think I even had a corrupt file. I think I ran it like that for a few months before I decided to run MemTest86. I was actually able to get the full 64 GB to pass MemTest86 by lowering the RAM speed from 3200 to 3000 in the BIOS.

Due to this experience, I will ALWAYS test every computer before I use it. My new computer passed with flying colors :)
I am having a very similar issue with my 32 to 64 RAM upgrade...roughly 3k errors....I was thiking I might have to swap back but for some reason completely overlooked trying again w/ lowered speeds, definately going to try this now! Just wanted to say thanks for the bump to the thoughtnoggin!


#87

ShadowMeow

ShadowMeow

Set RAM speed to 1600. No more errors. :)


#88

peterblaise

peterblaise

... SpinRite 6.1 ... threw up a huge amount of
these critical errors - although when I restart
the process they aren't always coming up in
the same sectors (see attached) ...

View attachment 1116
View attachment 1117
View attachment 1118
So, the_physio, have you ever gotten the
chance to revisit the 1.0T WDC
WD10EARS-22Y5B1 SN
WD-WCAV5H721905 HDD?

I suggest removing and reinstalling the
circuit card on the drive and cleaning the
contacts to ensure clean connections.

1745704856780.png


The drive may have worn out the track 0
area, so SpinRite may confirm that,
regardless, and if the data is NOT
important anymore, you can tell SpinRite
to barge on through:

SPINRITE NORAMTEST LEVEL 2/3/4/5 DYNASTAT 0

Data recovery
( in place ) and drive
maintenance
are what SpinRite is all about,
so once that memory distraction is over,
how about getting back to the drive and it's
data?