From: Stephan Diestelhorst on 9 Jul 2010 12:00 Hi, I have n issue with suepnd to RAM and I/O load on a disk. Symptoms are that the disk does not respond to requests when woken up, producing only I/O errors on all tested kernels (newest 2.6.35-rc4 (Ubuntu mainline PPA build)): [ 1719.580169] sd 0:0:0:0: [sda] Unhandled error code [ 1719.580174] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 1719.580178] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 0f 51 e7 88 00 00 b0 00 [ 1719.580186] end_request: I/O error, dev sda, sector 257025928 [ 1719.580798] Aborting journal on device dm-1-8. [ 1719.580912] EXT4-fs error (device dm-1) in ext4_reserve_inode_write: Journal has aborted [ 1719.580959] EXT4-fs (dm-1): Remounting filesystem read-only [ 1719.581004] sd 0:0:0:0: [sda] Unhandled error code [ 1719.581007] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 1719.581010] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 0f 51 a1 88 00 00 08 00 [ 1719.581016] end_request: I/O error, dev sda, sector 257008008 [ 1719.581026] Buffer I/O error on device dm-1, logical block 2129920 [ 1719.581027] lost page write due to I/O error on dm-1 [ 1719.581149] [ 1719.581214] sd 0:0:0:0: [sda] Unhandled error code [ 1719.581217] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 1719.581220] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 0e 4d a1 88 00 00 08 00 [ 1719.581227] end_request: I/O error, dev sda, sector 239968648 [ 1719.581254] JBD2: I/O error detected when updating journal superblock for dm-1-8. [ 1719.581268] journal commit I/O error This can be triggered most reliably with multiple "direct" writes to disk, I create the load with the attached script. If the issue is triggered, suspend (through pm-suspend) takes very long. IMHO the interesting log output during suspend is: [ 1668.150125] Suspending console(s) (use no_console_suspend to debug) [ 1668.150460] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 1668.174958] sd 0:0:0:0: [sda] Stopping disk [ 1668.198045] ACPI handle has no context! [ 1668.199302] ohci_hcd 0000:00:14.5: PCI INT C disabled [ 1668.199468] ohci_hcd 0000:00:13.1: PCI INT A disabled [ 1668.199477] ohci_hcd 0000:00:13.0: PCI INT A disabled [ 1668.199520] ehci_hcd 0000:00:12.2: PCI INT B disabled [ 1668.199525] ohci_hcd 0000:00:12.1: PCI INT A disabled [ 1668.199562] ohci_hcd 0000:00:12.0: PCI INT A disabled [ 1668.210138] ehci_hcd 0000:00:13.2: PCI INT B disabled [ 1668.300295] HDA Intel 0000:00:14.2: PCI INT A disabled [ 1668.300301] HDA Intel 0000:01:00.1: PCI INT B disabled [ 1668.300349] ACPI handle has no context! [ 1669.700139] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 1674.700125] ata1.00: qc timeout (cmd 0xec) [ 1674.700136] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 1674.700139] ata1.00: revalidation failed (errno=-5) [ 1675.230136] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 1685.230125] ata1.00: qc timeout (cmd 0xec) [ 1685.230137] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 1685.230140] ata1.00: revalidation failed (errno=-5) [ 1685.230144] ata1: limiting SATA link speed to 1.5 Gbps [ 1685.760137] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [ 1715.760126] ata1.00: qc timeout (cmd 0xec) [ 1715.760137] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 1715.760139] ata1.00: revalidation failed (errno=-5) [ 1715.760142] ata1.00: disabled [ 1715.810216] ahci 0000:00:11.0: PCI INT A disabled [ 1715.830154] PM: suspend of devices complete after 47679.847 msecs I've also attached the full dmesg, lspci -vv and smartctl -a information. Do you guys have any ideas here? Many thanks, Stephan -- Stephan Diestelhorst, AMD Operating System Research Center stephan.diestelhorst(a)amd.com, Tel. +49 (0)351 448 356 719 Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
From: Stephan Diestelhorst on 9 Jul 2010 17:50 I wrote: > I have an issue with suspend to RAM and I/O load on a disk. Symptoms > are that the disk does not respond to requests when woken up, producing > only I/O errors on all tested kernels (newest 2.6.35-rc4 (Ubuntu > mainline PPA build)): > <snip> > This can be triggered most reliably with multiple "direct" writes to > disk, I create the load with the attached script. If the issue is > triggered, suspend (through pm-suspend) takes very long. Attached now... > IMHO the interesting log output during suspend is: > [ 1674.700125] ata1.00: qc timeout (cmd 0xec) Almighty google suggested to try "pci=nomsi", which seems to have cured the issue for me for now. Is that plausible? I'll keep this under observation. Thanks, Stephan
From: Rafael J. Wysocki on 9 Jul 2010 18:00 On Friday, July 09, 2010, Stephan Diestelhorst wrote: > I wrote: > > I have an issue with suspend to RAM and I/O load on a disk. Symptoms > > are that the disk does not respond to requests when woken up, producing > > only I/O errors on all tested kernels (newest 2.6.35-rc4 (Ubuntu > > mainline PPA build)): > > > <snip> > > > This can be triggered most reliably with multiple "direct" writes to > > disk, I create the load with the attached script. If the issue is > > triggered, suspend (through pm-suspend) takes very long. > > Attached now... > > > IMHO the interesting log output during suspend is: > > [ 1674.700125] ata1.00: qc timeout (cmd 0xec) > > Almighty google suggested to try "pci=nomsi", which seems to have > cured the issue for me for now. Is that plausible? I'll keep this > under observation. Hmm. How does your /proc/interrupts look like? Also, do you have a link to this "Google suggestion"? Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Stephan Diestelhorst on 9 Jul 2010 19:10 Rafael J. Wysocki wrote: > On Friday, July 09, 2010, Stephan Diestelhorst wrote: > > I wrote: > > > I have an issue with suspend to RAM and I/O load on a disk. Symptoms > > > are that the disk does not respond to requests when woken up, producing > > > only I/O errors on all tested kernels (newest 2.6.35-rc4 (Ubuntu > > > mainline PPA build)): > > > > > <snip> > > > > > This can be triggered most reliably with multiple "direct" writes to > > > disk, I create the load with the attached script. If the issue is > > > triggered, suspend (through pm-suspend) takes very long. > > > > > IMHO the interesting log output during suspend is: > > > [ 1674.700125] ata1.00: qc timeout (cmd 0xec) > > > > Almighty google suggested to try "pci=nomsi", which seems to have > > cured the issue for me for now. Is that plausible? I'll keep this > > under observation. > > Hmm. How does your /proc/interrupts look like? This has been yet another red herring. After trying out the kernel option three times with two different kernels, it failed yet again with the same symptoms. I have attached /proc/interrupts for 2.6.35-rc4, once with pci=nomsi and once without, but again, I do not think this makes a difference :-/ > Also, do you have a link to this "Google suggestion"? It was some german forum, a guy with completely different HW, but the same symptom. I thought trying out the option wouldn't hurt. Maybe it came for example from http://lkml.org/lkml/2008/12/20/3 originally. Stephan
From: Rafael J. Wysocki on 9 Jul 2010 20:10 On Saturday, July 10, 2010, Stephan Diestelhorst wrote: > Rafael J. Wysocki wrote: > > On Friday, July 09, 2010, Stephan Diestelhorst wrote: > > > I wrote: > > > > I have an issue with suspend to RAM and I/O load on a disk. Symptoms > > > > are that the disk does not respond to requests when woken up, producing > > > > only I/O errors on all tested kernels (newest 2.6.35-rc4 (Ubuntu > > > > mainline PPA build)): > > > > > > > <snip> > > > > > > > This can be triggered most reliably with multiple "direct" writes to > > > > disk, I create the load with the attached script. If the issue is > > > > triggered, suspend (through pm-suspend) takes very long. > > > > > > > IMHO the interesting log output during suspend is: > > > > [ 1674.700125] ata1.00: qc timeout (cmd 0xec) > > > > > > Almighty google suggested to try "pci=nomsi", which seems to have > > > cured the issue for me for now. Is that plausible? I'll keep this > > > under observation. > > > > Hmm. How does your /proc/interrupts look like? > > This has been yet another red herring. After trying out the kernel > option three times with two different kernels, it failed yet again > with the same symptoms. I thought it would be like that. > I have attached /proc/interrupts for 2.6.35-rc4, once with pci=nomsi > and once without, but again, I do not think this makes a difference :-/ > > > Also, do you have a link to this "Google suggestion"? > > It was some german forum, a guy with completely different HW, but the > same symptom. I thought trying out the option wouldn't hurt. > > Maybe it came for example from http://lkml.org/lkml/2008/12/20/3 > originally. I have a box where this problem is kind of reproducible, but it happens _very_ rarely. Also I can't reproduce it on demand running suspend-resume in a tight loop. Are you able to reproduce it more regurarly? Also, what kind of disk do you use? Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 Prev: [PATCH 2/2] Add trace point to mremap Next: Add trace events to mmap and brk |