From: Michael Laajanen on 7 Sep 2009 17:45 Hi, andre.boegelsack wrote: > Usually this means you have a bad disk and you should replace it. Does > zpool status -v report any error or any bad disk? Yes, that the pool is degraded/faulty. But it still works, partly! /michael
From: Richard B. Gilbert on 7 Sep 2009 18:06 Michael Laajanen wrote: > Hi all, > > Zfs.. wrote: >> On Sep 7, 5:15 pm, Michael Laajanen <michael_laaja...(a)yahoo.com> >> wrote: >>> Hi, >>> >>> I am running Solaris 10 on a number of x86 nodes and have root and a >>> second filesystem mounted over SATA to two harddisk with ZFS. >>> >>> I am currently testing this SATA link to the harddisks using a simple >>> script which >>> >>> - creates a 40GB file1 >>> - copy from file1 to file2 >>> - removes file1 >>> - copy file2 to file1 >>> >>> This is repeated over and over, this one of the nodes reportes error as >>> below, anyone know what it actually means? >>> >>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.warning] WARNING: >>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>> Sep 5 09:39:13 siu5 Error for command 'read sector' Error Level: >>> Fatal >>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.notice] Requested Block >>> 123893504, Error Block: 123893541 >>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.notice] Sense Key: >>> uncorrectable data error >>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' >>> error code: 0x7 >>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.warning] WARNING: >>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>> Sep 5 09:39:17 siu5 Error for command 'read sector' Error Level: >>> Fatal >>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.notice] Requested Block >>> 123893504, Error Block: 123893541 >>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.notice] Sense Key: >>> uncorrectable data error >>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' >>> error code: 0x7 >>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.warning] WARNING: >>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>> Sep 5 09:39:21 siu5 Error for command 'read sector' Error Level: >>> Fatal >>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.notice] Requested Block >>> 123893504, Error Block: 123893541 >>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.notice] Sense Key: >>> uncorrectable data error >>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' >>> error code: 0x7 >>> Sep 5 09:39:25 siu5 gda: [ID 107833 kern.warning] WARNING: >>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>> Sep 5 09:39:25 siu5 Error for command 'read sector' Error Level: >>> Fatal >>> S >>> >>> /michael >> >> Scrub your zpool >> >> zpool scrub mypool >> >> And see if it reports any dodgy data on one of the disks. If it does, >> replace the disk. > Could it also mean that I have a bad connection to the drives, like bad > cables and so? Why I am asking this is all 11 nodes/hosts in the system > are connected to a inhouse designed SATA chassis backplane, one or two > drives per node driven via standard SAS cables from Sun(LSI) HBA's. > > Does anyone know of some way to enhance the error reports on these SATA > interfaces by means of setting some "debug variables" to the drivers in > order to get more detailed info? > > /michael What more do you think you need to know? The drive has made multiple attempts to read block 123893541 and has encountered an error on each attempt! The very first thing to do is to try to back up your data. With a little bit of luck you may be able to make a good backup. You could try replacing cables but I think it would be a complete waste of time. The reported error suggests a problem with the disk. A bad cable or a poorly seated connector would almost certainly show different error messages. At the very least your disk has a corrupted block. It's conceivable that rewriting that block might fix it. I would not want to trust such a disk with my valuable data.
From: Michael Laajanen on 7 Sep 2009 18:19 Hi, Richard B. Gilbert wrote: > Michael Laajanen wrote: >> Hi all, >> >> Zfs.. wrote: >>> On Sep 7, 5:15 pm, Michael Laajanen <michael_laaja...(a)yahoo.com> >>> wrote: >>>> Hi, >>>> >>>> I am running Solaris 10 on a number of x86 nodes and have root and a >>>> second filesystem mounted over SATA to two harddisk with ZFS. >>>> >>>> I am currently testing this SATA link to the harddisks using a simple >>>> script which >>>> >>>> - creates a 40GB file1 >>>> - copy from file1 to file2 >>>> - removes file1 >>>> - copy file2 to file1 >>>> >>>> This is repeated over and over, this one of the nodes reportes error as >>>> below, anyone know what it actually means? >>>> >>>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.warning] WARNING: >>>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>>> Sep 5 09:39:13 siu5 Error for command 'read sector' Error >>>> Level: Fatal >>>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.notice] Requested Block >>>> 123893504, Error Block: 123893541 >>>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.notice] Sense Key: >>>> uncorrectable data error >>>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' >>>> error code: 0x7 >>>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.warning] WARNING: >>>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>>> Sep 5 09:39:17 siu5 Error for command 'read sector' Error >>>> Level: Fatal >>>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.notice] Requested Block >>>> 123893504, Error Block: 123893541 >>>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.notice] Sense Key: >>>> uncorrectable data error >>>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' >>>> error code: 0x7 >>>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.warning] WARNING: >>>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>>> Sep 5 09:39:21 siu5 Error for command 'read sector' Error >>>> Level: Fatal >>>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.notice] Requested Block >>>> 123893504, Error Block: 123893541 >>>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.notice] Sense Key: >>>> uncorrectable data error >>>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' >>>> error code: 0x7 >>>> Sep 5 09:39:25 siu5 gda: [ID 107833 kern.warning] WARNING: >>>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>>> Sep 5 09:39:25 siu5 Error for command 'read sector' Error >>>> Level: Fatal >>>> S >>>> >>>> /michael >>> >>> Scrub your zpool >>> >>> zpool scrub mypool >>> >>> And see if it reports any dodgy data on one of the disks. If it does, >>> replace the disk. >> Could it also mean that I have a bad connection to the drives, like >> bad cables and so? Why I am asking this is all 11 nodes/hosts in the >> system are connected to a inhouse designed SATA chassis backplane, one >> or two drives per node driven via standard SAS cables from Sun(LSI) >> HBA's. >> >> Does anyone know of some way to enhance the error reports on these >> SATA interfaces by means of setting some "debug variables" to the >> drivers in order to get more detailed info? >> >> /michael > > What more do you think you need to know? The drive has made multiple > attempts to read block 123893541 and has encountered an error on each > attempt! > > The very first thing to do is to try to back up your data. With a > little bit of luck you may be able to make a good backup. > I do not have any data on the drives, I am testing a hard disk chassis. > You could try replacing cables but I think it would be a complete waste > of time. The reported error suggests a problem with the disk. A bad > cable or a poorly seated connector would almost certainly show different > error messages. That is exactly what I would like to know, what error messages can be seen on a standard S10, and if it possible to make the driver more verbose in order to get as much detailed info as possible! > > At the very least your disk has a corrupted block. It's conceivable > that rewriting that block might fix it. I would not want to trust such > a disk with my valuable data. /michael
From: Richard B. Gilbert on 7 Sep 2009 19:48 Michael Laajanen wrote: > Hi, > > Richard B. Gilbert wrote: >> Michael Laajanen wrote: >>> Hi all, >>> >>> Zfs.. wrote: >>>> On Sep 7, 5:15 pm, Michael Laajanen <michael_laaja...(a)yahoo.com> >>>> wrote: >>>>> Hi, >>>>> >>>>> I am running Solaris 10 on a number of x86 nodes and have root and a >>>>> second filesystem mounted over SATA to two harddisk with ZFS. >>>>> >>>>> I am currently testing this SATA link to the harddisks using a simple >>>>> script which >>>>> >>>>> - creates a 40GB file1 >>>>> - copy from file1 to file2 >>>>> - removes file1 >>>>> - copy file2 to file1 >>>>> >>>>> This is repeated over and over, this one of the nodes reportes >>>>> error as >>>>> below, anyone know what it actually means? >>>>> >>>>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.warning] WARNING: >>>>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>>>> Sep 5 09:39:13 siu5 Error for command 'read sector' Error >>>>> Level: Fatal >>>>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.notice] Requested Block >>>>> 123893504, Error Block: 123893541 >>>>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.notice] Sense Key: >>>>> uncorrectable data error >>>>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' >>>>> error code: 0x7 >>>>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.warning] WARNING: >>>>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>>>> Sep 5 09:39:17 siu5 Error for command 'read sector' Error >>>>> Level: Fatal >>>>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.notice] Requested Block >>>>> 123893504, Error Block: 123893541 >>>>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.notice] Sense Key: >>>>> uncorrectable data error >>>>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' >>>>> error code: 0x7 >>>>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.warning] WARNING: >>>>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>>>> Sep 5 09:39:21 siu5 Error for command 'read sector' Error >>>>> Level: Fatal >>>>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.notice] Requested Block >>>>> 123893504, Error Block: 123893541 >>>>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.notice] Sense Key: >>>>> uncorrectable data error >>>>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' >>>>> error code: 0x7 >>>>> Sep 5 09:39:25 siu5 gda: [ID 107833 kern.warning] WARNING: >>>>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>>>> Sep 5 09:39:25 siu5 Error for command 'read sector' Error >>>>> Level: Fatal >>>>> S >>>>> >>>>> /michael >>>> >>>> Scrub your zpool >>>> >>>> zpool scrub mypool >>>> >>>> And see if it reports any dodgy data on one of the disks. If it does, >>>> replace the disk. >>> Could it also mean that I have a bad connection to the drives, like >>> bad cables and so? Why I am asking this is all 11 nodes/hosts in the >>> system are connected to a inhouse designed SATA chassis backplane, >>> one or two drives per node driven via standard SAS cables from >>> Sun(LSI) HBA's. >>> >>> Does anyone know of some way to enhance the error reports on these >>> SATA interfaces by means of setting some "debug variables" to the >>> drivers in order to get more detailed info? >>> >>> /michael >> >> What more do you think you need to know? The drive has made multiple >> attempts to read block 123893541 and has encountered an error on each >> attempt! >> >> The very first thing to do is to try to back up your data. With a >> little bit of luck you may be able to make a good backup. >> > I do not have any data on the drives, I am testing a hard disk chassis. > >> You could try replacing cables but I think it would be a complete >> waste of time. The reported error suggests a problem with the disk. >> A bad cable or a poorly seated connector would almost certainly show >> different error messages. > That is exactly what I would like to know, what error messages can be > seen on a standard S10, and if it possible to make the driver more > verbose in order to get as much detailed info as possible! > >> >> At the very least your disk has a corrupted block. It's conceivable >> that rewriting that block might fix it. I would not want to trust >> such a disk with my valuable data. > > /michael I don't think that a vanilla Solaris system is able to do diagnosis at that level! IF you have a Sun service contract, Sun might be able to diagnose the fault in a little more detail. The basic problem is clear and I think you are just wasting time trying to find a work around. A new disk is simply not that expensive, especially if you don't buy it from Sun! If you want to pinch pennies, you could probably find a replacement on e-Bay.
From: Michael Laajanen on 8 Sep 2009 00:36
Hi, Richard B. Gilbert wrote: > Michael Laajanen wrote: >> Hi, >> >> Richard B. Gilbert wrote: >>> Michael Laajanen wrote: >>>> Hi all, >>>> >>>> Zfs.. wrote: >>>>> On Sep 7, 5:15 pm, Michael Laajanen <michael_laaja...(a)yahoo.com> >>>>> wrote: >>>>>> Hi, >>>>>> >>>>>> I am running Solaris 10 on a number of x86 nodes and have root and a >>>>>> second filesystem mounted over SATA to two harddisk with ZFS. >>>>>> >>>>>> I am currently testing this SATA link to the harddisks using a simple >>>>>> script which >>>>>> >>>>>> - creates a 40GB file1 >>>>>> - copy from file1 to file2 >>>>>> - removes file1 >>>>>> - copy file2 to file1 >>>>>> >>>>>> This is repeated over and over, this one of the nodes reportes >>>>>> error as >>>>>> below, anyone know what it actually means? >>>>>> >>>>>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.warning] WARNING: >>>>>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>>>>> Sep 5 09:39:13 siu5 Error for command 'read sector' Error >>>>>> Level: Fatal >>>>>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.notice] Requested Block >>>>>> 123893504, Error Block: 123893541 >>>>>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.notice] Sense Key: >>>>>> uncorrectable data error >>>>>> Sep 5 09:39:13 siu5 gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' >>>>>> error code: 0x7 >>>>>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.warning] WARNING: >>>>>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>>>>> Sep 5 09:39:17 siu5 Error for command 'read sector' Error >>>>>> Level: Fatal >>>>>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.notice] Requested Block >>>>>> 123893504, Error Block: 123893541 >>>>>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.notice] Sense Key: >>>>>> uncorrectable data error >>>>>> Sep 5 09:39:17 siu5 gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' >>>>>> error code: 0x7 >>>>>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.warning] WARNING: >>>>>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>>>>> Sep 5 09:39:21 siu5 Error for command 'read sector' Error >>>>>> Level: Fatal >>>>>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.notice] Requested Block >>>>>> 123893504, Error Block: 123893541 >>>>>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.notice] Sense Key: >>>>>> uncorrectable data error >>>>>> Sep 5 09:39:21 siu5 gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' >>>>>> error code: 0x7 >>>>>> Sep 5 09:39:25 siu5 gda: [ID 107833 kern.warning] WARNING: >>>>>> /pci@0,0/pci-ide(a)1f,2/ide@1/cmdk@0,0 (Disk1): >>>>>> Sep 5 09:39:25 siu5 Error for command 'read sector' Error >>>>>> Level: Fatal >>>>>> S >>>>>> >>>>>> /michael >>>>> >>>>> Scrub your zpool >>>>> >>>>> zpool scrub mypool >>>>> >>>>> And see if it reports any dodgy data on one of the disks. If it does, >>>>> replace the disk. >>>> Could it also mean that I have a bad connection to the drives, like >>>> bad cables and so? Why I am asking this is all 11 nodes/hosts in the >>>> system are connected to a inhouse designed SATA chassis backplane, >>>> one or two drives per node driven via standard SAS cables from >>>> Sun(LSI) HBA's. >>>> >>>> Does anyone know of some way to enhance the error reports on these >>>> SATA interfaces by means of setting some "debug variables" to the >>>> drivers in order to get more detailed info? >>>> >>>> /michael >>> >>> What more do you think you need to know? The drive has made multiple >>> attempts to read block 123893541 and has encountered an error on each >>> attempt! >>> >>> The very first thing to do is to try to back up your data. With a >>> little bit of luck you may be able to make a good backup. >>> >> I do not have any data on the drives, I am testing a hard disk chassis. >> >>> You could try replacing cables but I think it would be a complete >>> waste of time. The reported error suggests a problem with the disk. >>> A bad cable or a poorly seated connector would almost certainly show >>> different error messages. >> That is exactly what I would like to know, what error messages can be >> seen on a standard S10, and if it possible to make the driver more >> verbose in order to get as much detailed info as possible! >> >>> >>> At the very least your disk has a corrupted block. It's conceivable >>> that rewriting that block might fix it. I would not want to trust >>> such a disk with my valuable data. >> >> /michael > > > I don't think that a vanilla Solaris system is able to do diagnosis at > that level! Could be so, but quite often there are flags that can be set to enhance the report level even on production systems. > > IF you have a Sun service contract, Sun might be able to diagnose the > fault in a little more detail. > The chassis is own designed and the chassis is what I am trying to verify. > The basic problem is clear and I think you are just wasting time trying > to find a work around. A new disk is simply not that expensive, > especially if you don't buy it from Sun! If you want to pinch pennies, > you could probably find a replacement on e-Bay. > I have no problem getting new drives, but that assumes that that there is a drive fault on not the elctrical connection! I just like to pin point as detailed as possible before I decide what to do! /michael |