From: invalid on 8 Sep 2009 19:44 >>> The error message strongly suggests a problem with one block on disk. >>> The problem may be more serious, more complicated, or both. Hi guys, I opened a few threads on this same symptom a long time ago on bot opensolarisforums and sunforums. I traced my problem down to a faulty sata cable on my DVD drive! But I was getting ZFS errors on my root mirror pool on the hard drives! Something is *very* wrong with Solaris 10's SATA error handling. I also go through fits every time I install a new copy of Solaris, 95% of the time grub will not install on one or the other drive. I don't have any answers for you except to reinforce a couple of points. Check your cables on all SATA devices, even devices that aren't hard drives. Don't be fooled into thinking a bad cable wouldn't be intermittent, it can be. I had to copy multi gigabyte DVDs from one filesystem to another and I could get the ZFS error every time. But most of the time I didn't see any errors. Solaris 10 has a long way to go with SATA support, including error handling and device identification.
From: Michael Laajanen on 8 Sep 2009 20:09 Hi, Thomas Tornblom wrote: > Michael Laajanen <michael_laajanen(a)yahoo.com> writes: > > ... >>> The error message strongly suggests a problem with one block on >>> disk. The problem may be more serious, more complicated, or both. >>> Solaris is not in the business of diagnosing defective hardware. >>> Sun, and other manufacturers write diagnostic software to exercise >>> and test hardware. Sun Service should have access to such software. >>> You may have write your own. Doing so requires both a good >>> knowledge of the hardware and good knowledge of the software. >>> It's easier and cheaper (if your time has value) to replace the >>> drive. >> Replacing the drive is not what it is about, if it is defective yes but >> so far I will not rule out the others parts se below. > > It is the *disk* that is telling the system that it has a problem, not > the system detecting a problem with the disk. > Is that so, then it should be a failing disk right. The strange thing is that I have had three similar crashes of 10 possible which for me makes me belive in something else than the disk especilly since I done the backplane and cables myself :) But it could ofcourse be just the disks, I will replace them and continue testing. >> I think you have not understood what I have done, I have designed a >> harddisk chassis with 17 disks which is connected to a number of >> servers, during testing of this chassis I have received these errors. > > So? > >> /michael > > Thomas /michael
From: Chris Ridd on 9 Sep 2009 02:44 On 2009-09-08 19:28:51 +0100, Richard B. Gilbert said: > Chris Ridd wrote: >> On 2009-09-08 14:38:30 +0100, "Richard B. >> Gilbert"<rgilbert88(a)comcast.net> said: >> >>> Solaris is not in the business of diagnosing defective hardware. >> >> Doesn't <http://opensolaris.org/os/community/fm/> suggest otherwise? >> > > Never seen it! I'm running S8, 9, and 10 on various machines. It is part of S10 as well.
From: Thomas Tornblom on 13 Sep 2009 14:51 Michael Laajanen <michael_laajanen(a)yahoo.com> writes: > Just back from office and the test with 8 UFS filesystems has been > going on for some 20 hours without any problem so one way pr the other > it seams to be ZFS related and maybe the way ZFS driver handles > read/write errors compared to UFS? Or could it be that ZFS check the > files on the disk i a better way perhaps? > > /michael ZFS does end to end data integrity checking, including checksumming all data, so ZFS detects data corruption even if the hardware says everything is OK. UFS does no such checking. But the errors yoou've showed before indicates that the driver has detected a problem, and that is done below either of ZFS or UFS. One thing that is different is that if ZFS is given a whole disk it will enable the disk cache, while it is disabled (if possible) if the disk is used with normal partitioning, which is how UFS use it. Some disks have caches that can't be disabled. Have you run the tests in "format"? Thomas
From: Michael Laajanen on 13 Sep 2009 17:11
Hi, Thomas Tornblom wrote: > Michael Laajanen <michael_laajanen(a)yahoo.com> writes: >> Just back from office and the test with 8 UFS filesystems has been >> going on for some 20 hours without any problem so one way pr the other >> it seams to be ZFS related and maybe the way ZFS driver handles >> read/write errors compared to UFS? Or could it be that ZFS check the >> files on the disk i a better way perhaps? >> >> /michael > > ZFS does end to end data integrity checking, including checksumming > all data, so ZFS detects data corruption even if the hardware says > everything is OK. UFS does no such checking. > Right, that is how I have understood ZFS and also liked :) > But the errors yoou've showed before indicates that the driver has > detected a problem, and that is done below either of ZFS or UFS. > > One thing that is different is that if ZFS is given a whole disk it > will enable the disk cache, while it is disabled (if possible) if the > disk is used with normal partitioning, which is how UFS use it. Some > disks have caches that can't be disabled. Is there some easy way to enable the cache under UFS, I assume that this is write cache. > > Have you run the tests in "format"? > > Thomas Nope, the UFS test was choosen and done in a hurry in the weekend but will try that tomorrow. /michael |