From: Ralph Katz on 2 Jun 2010 21:30 Lenny install on newly acquired used Dell hangs and throws errors to syslog. Do I have two bad disks or a more serious hardware problem? Short of buying a new disk, how would I know? What would you recommend? Or do I have a simple BIOS setting problem? (My last post to debian-user was in 2008. Etch has continued to be rock solid on two desktops. Now I felt was time to upgrade.) First, an old DELL GX240 was obtained and Lenny/xfce installed; P4, 1Gb, 120 Gb WDC disk. Syslog showed all kinds of errors while system would hang at times: May 24 21:53:39 spike kernel: [ 5034.952013] hda: status timeout: status=0x80 { Busy } May 24 21:53:39 spike kernel: [ 5034.952021] ide: failed opcode was: unknown May 24 21:53:39 spike kernel: [ 5034.952030] hda: DMA disabled May 24 21:53:39 spike kernel: [ 5034.952066] hda: drive not ready for command May 24 21:54:14 spike kernel: [ 5064.952021] ide0: reset timed-out, status=0x80 May 24 21:54:14 spike kernel: [ 5065.393331] hda: status timeout: status=0x80 { Busy } May 24 21:54:14 spike kernel: [ 5065.393331] ide: failed opcode was: unknown May 24 21:54:14 spike kernel: [ 5065.393331] hda: drive not ready for command May 24 21:54:14 spike kernel: [ 5065.393331] Clocksource tsc unstable (delta = 4686898152 ns) May 24 21:54:44 spike kernel: [ 5099.964023] ide0: reset timed-out, status=0x80 May 24 21:54:44 spike kernel: [ 5099.964040] end_request: I/O error, dev hda, sector 10867375 May 24 21:54:44 spike kernel: [ 5099.964104] end_request: I/O error, dev hda, sector 13826839 May 24 21:54:44 spike kernel: [ 5099.964115] Buffer I/O error on device dm-2, logical block 360455 [snipped 20 Kb of I/O errors] May 24 21:54:44 spike kernel: [ 5099.967007] end_request: I/O error, dev hda, sector 208223535 May 24 21:54:44 spike kernel: [ 5099.967024] EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=5792911, block=23167050 May 24 21:54:44 spike kernel: [ 5099.967128] Aborting journal on device dm-5. May 24 21:54:44 spike kernel: [ 5099.968575] ext3_abort called. May 24 21:54:44 spike kernel: [ 5099.968587] EXT3-fs error (device dm-5): ext3_journal_start_sb: Detected aborted journal May 24 21:54:44 spike kernel: [ 5099.968594] Remounting filesystem read-only I concluded the disk was dead (but SMART tests PASSED), and replaced it with another used 120 Gb WDC, re-installed Lenny, and soon the system would again hang, typically at start up. Sylog entries of note with the second disk installed: /var/log/syslog:Jun 2 08:52:40 spike smartd[2346]: Device: /dev/hda, SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 100 to 198 /var/log/syslog.1:Jun 1 08:13:56 spike kernel: [ 936.000023] hda: dma_timer_expiry: dma status == 0x21 /var/log/syslog.1:Jun 1 08:28:44 spike smartd[2357]: Device: /dev/hda, SMART Usage Attribute: 196 Reallocated_Event_Count changed from 196 to 195 May 31 09:54:09 spike kernel: [ 620.084022] hda: dma_timer_expiry: dma status == 0x20 May 31 09:54:09 spike kernel: [ 620.084031] hda: DMA timeout retry May 31 09:54:09 spike kernel: [ 620.084034] hda: timeout waiting for DMA May 31 09:54:09 spike kernel: [ 624.232267] Clocksource tsc unstable (delta = 4686697657 ns) May 31 10:14:07 spike smartd[2331]: Device: /dev/hda, SMART Prefailure Attribute: 5 Reallocated_Sector_Ct changed from 200 to 199 May 31 10:14:07 spike smartd[2331]: Device: /dev/hda, SMART Usage Attribute: 196 Reallocated_Event_Count changed from 200 to 196 Meanwhile, SMART self-tests short and long passed. No errors were reported by smartctl -a /dev/hda. This morning I had to reboot a hung system with Alt SysRq b because X, an ssh connection, VT1 and CrlAltDel failed. Searching the net for "Clocksource tsc unstable" suggested disabling acpi in bios. Hey, I'm just a desktop user, and this is beginning to get beyond my 7 yrs capabilities of understanding the magic. Suggestions welcomed, thanks! Ralph -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/4C070380.700(a)rcn.com
From: Mark on 2 Jun 2010 23:30 On Wed, Jun 2, 2010 at 6:21 PM, Ralph Katz <ralph.katz(a)rcn.com> wrote: > Lenny install on newly acquired used Dell hangs and throws errors to > syslog. Do I have two bad disks or a more serious hardware problem? > Short of buying a new disk, how would I know? What would you recommend? > Or do I have a simple BIOS setting problem? > [snip] If you boot to an Ubuntu Live CD, it will automatically let you know of any bad hard disk sectors via a pop up GUI upon booting to the desktop environment. I inherited a decommissioned hard drive from a server room and used Ubuntu Live CD to confirm it had bad sectors, hence the reason for its decommissioning. Once you confirm it's not the hdd, then you can troubleshoot other possibilities. HTH. Mark
From: Jochen Schulz on 3 Jun 2010 02:10 Ralph Katz: > > Lenny install on newly acquired used Dell hangs and throws errors to > syslog. Do I have two bad disks or a more serious hardware problem? Another option: it might be a kernel problem. I don't remember the specifics anymore, but on one of my systems I had similar errors. After replacing the disk and still getting these errors, I found hints that the kernel might be at fault. I then installed a newer kernel from backports.org and the problems went away. > May 24 21:54:14 spike kernel: [ 5065.393331] Clocksource tsc unstable > (delta = 4686898152 ns) This line is irrelevant for the hard disk problem. > /var/log/syslog:Jun 2 08:52:40 spike smartd[2346]: Device: /dev/hda, > SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 100 to 198 > /var/log/syslog.1:Jun 1 08:13:56 spike kernel: [ 936.000023] hda: > dma_timer_expiry: dma status == 0x21 > /var/log/syslog.1:Jun 1 08:28:44 spike smartd[2357]: Device: /dev/hda, > SMART Usage Attribute: 196 Reallocated_Event_Count changed from 196 to 195 That's a real hard disk error, but unless it happens regularly, you don't need to worry. These happen sometimes and the disk is usually able to handle it. > Meanwhile, SMART self-tests short and long passed. No errors were > reported by smartctl -a /dev/hda. Well, at least the reallocation events should have been counted. It doesn't hurt to post smartctl's output. J. -- In an ideal world I would cure poverty and go to the gym at least three days a week. [Agree] [Disagree] <http://www.slowlydownward.com/NODATA/data_enter2.html>
From: David Baron on 3 Jun 2010 09:50 I sometimes get this. The disks click-clack. Those messages. Usually rebooting after jiggling the cables fixes it. Maybe replace them. Also check the power supply. Working? Adequate? -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/201006031625.34181.d_baron(a)012.net.il
From: Daniel Barclay on 3 Jun 2010 12:00
Ralph, Jochen Schulz wrote: > Ralph Katz: >> Lenny install on newly acquired used Dell hangs and throws errors to >> syslog. Do I have two bad disks or a more serious hardware problem? > > Another option: it might be a kernel problem. I don't remember the > specifics anymore, but on one of my systems I had similar errors. After > replacing the disk and still getting these errors, I found hints that > the kernel might be at fault. I then installed a newer kernel from > backports.org and the problems went away. What processor and chipset does your motherboard use? Do you get Does changing your IDE/ATA controllers from DMA mode to PIO mode stop the message? (I had similar problems (got similar log message) with a dual-processor AMD Athlon MP board. Apparently, the AMD chipset apparently had some bug, the Linux didn't work around that particular bug, and the kernel's IDE DMA code (or maybe filesystem code) wasn't very robust--it didn't retry an operation that failed because of a detected DMA timeout, and it didn't even detect that the operation failed and stop (panic or something) before things (disk and filesystem state) became inconsistent.) Daniel -- -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/4C07D027.9020203(a)fgm.com |