From: invalid on
>>> The error message strongly suggests a problem with one block on disk.
>>> The problem may be more serious, more complicated, or both.

Hi guys,

I opened a few threads on this same symptom a long time ago on bot
opensolarisforums and sunforums. I traced my problem down to a faulty sata
cable on my DVD drive! But I was getting ZFS errors on my root mirror pool
on the hard drives! Something is *very* wrong with Solaris 10's SATA error
handling. I also go through fits every time I install a new copy of Solaris,
95% of the time grub will not install on one or the other drive.

I don't have any answers for you except to reinforce a couple of
points. Check your cables on all SATA devices, even devices that aren't hard
drives. Don't be fooled into thinking a bad cable wouldn't be intermittent,
it can be. I had to copy multi gigabyte DVDs from one filesystem to another
and I could get the ZFS error every time. But most of the time I didn't see
any errors.

Solaris 10 has a long way to go with SATA support, including error handling
and device identification.
From: Michael Laajanen on
Hi,

Thomas Tornblom wrote:
> Michael Laajanen <michael_laajanen(a)yahoo.com> writes:
>
> ...
>>> The error message strongly suggests a problem with one block on
>>> disk. The problem may be more serious, more complicated, or both.
>>> Solaris is not in the business of diagnosing defective hardware.
>>> Sun, and other manufacturers write diagnostic software to exercise
>>> and test hardware. Sun Service should have access to such software.
>>> You may have write your own. Doing so requires both a good
>>> knowledge of the hardware and good knowledge of the software.
>>> It's easier and cheaper (if your time has value) to replace the
>>> drive.
>> Replacing the drive is not what it is about, if it is defective yes but
>> so far I will not rule out the others parts se below.
>
> It is the *disk* that is telling the system that it has a problem, not
> the system detecting a problem with the disk.
>
Is that so, then it should be a failing disk right.

The strange thing is that I have had three similar crashes of 10
possible which for me makes me belive in something else than the disk
especilly since I done the backplane and cables myself :)

But it could ofcourse be just the disks, I will replace them and
continue testing.


>> I think you have not understood what I have done, I have designed a
>> harddisk chassis with 17 disks which is connected to a number of
>> servers, during testing of this chassis I have received these errors.
>
> So?
>
>> /michael
>
> Thomas


/michael
From: Chris Ridd on
On 2009-09-08 19:28:51 +0100, Richard B. Gilbert said:

> Chris Ridd wrote:
>> On 2009-09-08 14:38:30 +0100, "Richard B.
>> Gilbert"<rgilbert88(a)comcast.net> said:
>>
>>> Solaris is not in the business of diagnosing defective hardware.
>>
>> Doesn't <http://opensolaris.org/os/community/fm/> suggest otherwise?
>>
>
> Never seen it! I'm running S8, 9, and 10 on various machines.

It is part of S10 as well.

From: Thomas Tornblom on
Michael Laajanen <michael_laajanen(a)yahoo.com> writes:
> Just back from office and the test with 8 UFS filesystems has been
> going on for some 20 hours without any problem so one way pr the other
> it seams to be ZFS related and maybe the way ZFS driver handles
> read/write errors compared to UFS? Or could it be that ZFS check the
> files on the disk i a better way perhaps?
>
> /michael

ZFS does end to end data integrity checking, including checksumming
all data, so ZFS detects data corruption even if the hardware says
everything is OK. UFS does no such checking.

But the errors yoou've showed before indicates that the driver has
detected a problem, and that is done below either of ZFS or UFS.

One thing that is different is that if ZFS is given a whole disk it
will enable the disk cache, while it is disabled (if possible) if the
disk is used with normal partitioning, which is how UFS use it. Some
disks have caches that can't be disabled.

Have you run the tests in "format"?

Thomas
From: Michael Laajanen on
Hi,

Thomas Tornblom wrote:
> Michael Laajanen <michael_laajanen(a)yahoo.com> writes:
>> Just back from office and the test with 8 UFS filesystems has been
>> going on for some 20 hours without any problem so one way pr the other
>> it seams to be ZFS related and maybe the way ZFS driver handles
>> read/write errors compared to UFS? Or could it be that ZFS check the
>> files on the disk i a better way perhaps?
>>
>> /michael
>
> ZFS does end to end data integrity checking, including checksumming
> all data, so ZFS detects data corruption even if the hardware says
> everything is OK. UFS does no such checking.
>
Right, that is how I have understood ZFS and also liked :)

> But the errors yoou've showed before indicates that the driver has
> detected a problem, and that is done below either of ZFS or UFS.
>
> One thing that is different is that if ZFS is given a whole disk it
> will enable the disk cache, while it is disabled (if possible) if the
> disk is used with normal partitioning, which is how UFS use it. Some
> disks have caches that can't be disabled.
Is there some easy way to enable the cache under UFS, I assume that this
is write cache.
>
> Have you run the tests in "format"?
>
> Thomas
Nope, the UFS test was choosen and done in a hurry in the weekend but
will try that tomorrow.

/michael