Prev: MIPS: Add base support for Ingenic JZ4740 System-on-a-Chip
Next: [PATCH v4] block: avoid unconditionally freeing previously allocated request_queue
From: Stephen Hemminger on 3 Jun 2010 13:20 The ancient and crufty floppy driver has some serious problems if used during the boot process. It will miss data, and cause OOPS because of callback routines called in unexpected contexts. All this goes away if INITIAL_JIFFIES is 0 as it was when this driver was written. The root cause of all this is lots of places that assume the initial jiffies value is 0. Examples: floppy_open: if (!(mode & FMODE_NDELAY)) { if (mode & (FMODE_READ|FMODE_WRITE)) { UDRS->last_checked = 0; start_motor: if (!(raw_cmd->flags & FD_RAW_NO_MOTOR)) { if (!(FDCS->dor & (0x10 << UNIT(current_drive)))) { set_debugt(); /* no read since this drive is running */ DRS->first_read_date = 0; and the relationship between select_date and select_delay. The driver is a complex mess of spaghetti state. Does one of the original authors have more background to fix this? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 3 Jun 2010 13:40 On Thu, 3 Jun 2010, Stephen Hemminger wrote: > > The driver is a complex mess of spaghetti state. Does one of the original > authors have more background to fix this? Doubtful. There are no "original authors" left really. Yes, I technically wrote the original driver, but it has gotten seriously rewritten a couple of times - sadly always just expanding on the cruftyness. I do wonder a bit why you even care. You can't even buy non-USB floppy drives any more (and even the USB ones are really really hard to find), and judging by my experiences, they don't last long enough for there to be much of a legacy base. But please post the oops. I don't think it should ever be valid to oops even if it gets confused about first_read_date or last_checked. Some hacking around for them being "in the future" would be trivial enough. But most of them shouldn't matter. The whole 'first_read_date', for example, is purely used for some randomness - it tries to use the 'revolutions per second' to decide when to stop the drive, but that's just some voodoo programming and has no real meaning, afaik. The oops would be interesting just to see _which_ case it is that we care about the time so much (clearly _too_ much). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Stephen Hemminger on 3 Jun 2010 15:10 On Thu, 3 Jun 2010 10:28:17 -0700 (PDT) Linus Torvalds <torvalds(a)linux-foundation.org> wrote: > > > On Thu, 3 Jun 2010, Stephen Hemminger wrote: > > > > The driver is a complex mess of spaghetti state. Does one of the original > > authors have more background to fix this? > > Doubtful. There are no "original authors" left really. Yes, I technically > wrote the original driver, but it has gotten seriously rewritten a couple > of times - sadly always just expanding on the cruftyness. > > I do wonder a bit why you even care. You can't even buy non-USB floppy > drives any more (and even the USB ones are really really hard to find), > and judging by my experiences, they don't last long enough for there to be > much of a legacy base. > > But please post the oops. I don't think it should ever be valid to oops > even if it gets confused about first_read_date or last_checked. Some > hacking around for them being "in the future" would be trivial enough. > > But most of them shouldn't matter. The whole 'first_read_date', for > example, is purely used for some randomness - it tries to use the > 'revolutions per second' to decide when to stop the drive, but that's just > some voodoo programming and has no real meaning, afaik. > > The oops would be interesting just to see _which_ case it is that we care > about the time so much (clearly _too_ much). > > Linus Example warning from 2.6.32 (VMware based regression system) [ 195.061209] floppy0: disk absent or changed during operation [ 195.061655] end_request: I/O error, dev fd0, sector 2 [ 195.063860] Buffer I/O error on device fd0, logical block 1 [ 195.064650] lost page write due to I/O error on fd0 [ 195.067456] floppy0: disk absent or changed during operation [ 195.067468] end_request: I/O error, dev fd0, sector 2 [ 195.068518] Buffer I/O error on device fd0, logical block 1 [ 195.069618] lost page write due to I/O error on fd0 [ 195.077148] ------------[ cut here ]------------ [ 195.077647] WARNING: at fs/buffer.c:1159 mark_buffer_dirty+0x20/0x7a() [ 195.077699] Hardware name: VMware Virtual Platform [ 195.077743] Modules linked in: iptable_nat iptable_filter ip6table_filter ip6table_raw ip6_tables xt_NOTRACK iptable_ raw ip_tables x_tables nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_nat_sip nf_ conntrack_sip nf_nat_proto_gre nf_nat_tftp nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_tftp nf_connt rack_ftp nf_conntrack ipv6 container ac button processor parport_pc parport psmouse serio_raw intel_agp shpchp i2c_piix4 pci_hotplug agpgart i2c_core pcspkr evdev vfat fat ext2 mbcache battery squashfs loop unionfs nls_utf8 isofs nls_base a sync_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx md_mod sg sr_mod cdrom ata_piix sd_mod crc_t10dif pata_acpi ata_generic mptspi mptscsih libata mptbase scsi_transport_spi floppy e1000 scsi_mod thermal fan thermal_sys [ last unloaded: raid10] [ 195.078800] Pid: 3118, comm: umount Not tainted 2.6.32-1-586-vyatta #1 [ 195.078825] Call Trace: [ 195.079041] [<c102bfdf>] ? warn_slowpath_common+0x5e/0x8a [ 195.079107] [<c102c015>] ? warn_slowpath_null+0xa/0xc [ 195.079112] [<c10bd987>] ? mark_buffer_dirty+0x20/0x7a [ 195.079296] [<d085d37d>] ? ext2_sync_fs+0x5e/0x70 [ext2] [ 195.079302] [<c10bc15f>] ? __sync_filesystem+0x51/0x60 [ 195.079335] [<c10a49cb>] ? generic_shutdown_super+0x19/0xc7 [ 195.079339] [<c10a4a93>] ? kill_block_super+0x1a/0x2c [ 195.079382] [<c10d2e82>] ? vfs_quota_off+0x0/0xd [ 195.079386] [<c10a4ff2>] ? deactivate_super+0x4b/0x61 [ 195.079391] [<c10b4f2a>] ? sys_umount+0x28c/0x2b2 [ 195.079394] [<c10b4f5b>] ? sys_oldumount+0xb/0xe [ 195.079398] [<c10029b5>] ? syscall_call+0x7/0xb [ 195.079469] ---[ end trace 89a1db2950506e4b ]--- [ 195.079594] floppy0: disk absent or changed during operation [ 195.079598] end_request: I/O error, dev fd0, sector 2 [ 195.080630] Buffer I/O error on device fd0, logical block 1 [ 195.080997] lost page write due to I/O error on fd0 [ 195.084480] floppy0: disk absent or changed during operation [ 195.084487] end_request: I/O error, dev fd0, sector 2 [ 195.085761] Buffer I/O error on device fd0, logical block 1 [ 195.086906] lost page write due to I/O error on fd0 [ 317.428247] Netfilter messages via NETLINK v0.30. [ 317.480612] ctnetlink v0.93: registering with nfnetlink. [ 506.734183] NET: Registered protocol family 15 On 2.6.32 example (KVM with livecd + empty floppy image) and debugging forced on: [ 6.795067] floppy0: reschedule timeout lock fdc [ 6.796628] floppy: poll_drive [ 6.797602] floppy0: setting NEWCHANGE in poll_drive [ 6.798949] floppy0: calling disk change from floppy_ready [ 6.800538] floppy0: checking disk change line for drive 0 [ 6.802171] floppy0: jiffies=4294893996 [ 6.803439] floppy0: disk change line=0 [ 6.804595] floppy0: flags=24 [ 6.805579] rw_command: dtime=720 [ 6.806633] floppy0: reschedule timeout do wakeup [ 6.808198] FAT: utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive! [ 6.810797] floppy: do_fd_request [ 6.811832] floppy0: do_fd_request [ 6.812917] floppy0: reschedule timeout redo fd request [ 6.814656] floppy0: checking disk change line for drive 0 [ 6.818855] floppy0: jiffies=4294894000 [ 6.820509] floppy0: disk change line=0 [ 6.822154] floppy0: flags=20 [ 6.823463] floppy0: reschedule timeout request done 1 [ 6.825272] unlock_fdc [ 7.038143] udevd version 125 started [ 7.841449] input: PC Speaker as /class/input/input2 [ 7.850997] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0 [ 7.895128] alloc irq_desc for 24 on node -1 [ 7.896068] alloc kstat_irqs on node -1 [ 7.897640] virtio-pci 0000:00:03.0: irq 24 for MSI/MSI-X [ 7.898998] alloc irq_desc for 25 on node -1 [ 7.900213] alloc kstat_irqs on node -1 [ 7.901382] virtio-pci 0000:00:03.0: irq 25 for MSI/MSI-X [ 7.903096] alloc irq_desc for 26 on node -1 [ 7.904351] alloc kstat_irqs on node -1 [ 7.905550] virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X [ 8.126101] input: Power Button as /class/input/input3 [ 8.127620] ACPI: Power Button [PWRF] [ 8.160496] processor LNXCPU:00: registered as cooling_device0 [ 8.237244] input: ImExPS/2 Generic Explorer Mouse as /class/input/input4 [ 9.724066] floppy0: calling disk change from set_dor [ 9.726342] floppy0: checking disk change line for drive 0 [ 9.728740] floppy0: jiffies=4294894728 [ 9.730566] floppy0: disk change line=0 [ 9.732371] floppy0: flags=20 [ 14.040103] end_request: I/O error, dev fd0, sector 2176 [ 14.041418] Buffer I/O error on device fd0, logical block 272 [ 14.042737] BUG: unable to handle kernel NULL pointer dereference at 00000040 [ 14.044052] IP: [<e08c2903>] floppy_ready+0x46f/0x4e3 [floppy] [ 14.044052] *pde = 00000000 [ 14.044052] Oops: 0000 [#1] SMP [ 14.044052] last sysfs file: /sys/class/net/lo/operstate [ 14.044052] Modules linked in: ipv6 processor button psmouse virtio_net i2c_piix4 serio_raw pcspkr i2c_core evdev joydev vfat fat ext2 mbcache battery squashfs loop unionfs nls_utf8 isofs usbhid hid raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear md_mod sg sr_mod cdrom ata_piix virtio_blk pata_acpi ata_generic uhci_hcd libata virtio_pci virtio_ring usbcore nls_base floppy scsi_mod virtio thermal fan thermal_sys [last unloaded: scsi_wait_scan] [ 14.044052] [ 14.044052] Pid: 6, comm: events/0 Not tainted (2.6.32-1-586-vyatta #1) Bochs [ 14.044052] EIP: 0060:[<e08c2903>] EFLAGS: 00010246 CPU: 0 [ 14.044052] EIP is at floppy_ready+0x46f/0x4e3 [floppy] [ 14.044052] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000 [ 14.044052] ESI: 00000202 EDI: df03ffb0 EBP: e08c626c ESP: df03ff68 [ 14.044052] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 14.044052] Process events/0 (pid: 6, ti=df03e000 task=df0254a0 task.ti=df03e000) [ 14.044052] Stack: [ 14.044052] df03ff00 e08bf4bd e08c46d5 e08c5715 c1c07f80 e08c6268 c103c31f df0256b4 [ 14.044052] <0> df0254a0 c1c07f8c c1c07f84 e08c2d1c df0254a0 00000000 df0254a0 c103f00e [ 14.044052] <0> df03ffa8 df03ffa8 df023f58 c1c07f80 c103c1e1 00000000 c103eddb 00000000 [ 14.044052] Call Trace: [ 14.044052] [<e08bf4bd>] ? debugt+0x2c/0x30 [floppy] [ 14.044052] [<c103c31f>] ? worker_thread+0x13e/0x1b6 [ 14.044052] [<e08c2d1c>] ? seek_interrupt+0x0/0xf8 [floppy] [ 14.044052] [<c103f00e>] ? autoremove_wake_function+0x0/0x2d [ 14.044052] [<c103c1e1>] ? worker_thread+0x0/0x1b6 [ 14.044052] [<c103eddb>] ? kthread+0x61/0x66 [ 14.044052] [<c103ed7a>] ? kthread+0x0/0x66 [ 14.044052] [<c10034a7>] ? kernel_thread_helper+0x7/0x10 [ 14.044052] Code: 20 eb 18 0f b6 05 c8 7b 8c e0 6b c0 34 8d 90 e0 6d 8c e0 3e 80 a0 e0 6d 8c e0 df 0f b6 05 c8 7b 8c e0 8b 15 00 72 8c e0 6b c0 34 <8b> 5a 40 0f bf 80 f2 6d 8c e0 39 c3 74 56 80 3a 00 79 51 b8 0f [ 14.044052] EIP: [<e08c2903>] floppy_ready+0x46f/0x4e3 [floppy] SS:ESP 0068:df03ff68 [ 14.044052] CR2: 0000000000000040 [ 14.109388] ---[ end trace 0efe46cb49bca248 ]--- [ 14.110462] Kernel panic - not syncing: Fatal exception [ 14.111639] Pid: 6, comm: events/0 Tainted: G D 2.6.32-1-586-vyatta #1 [ 14.113449] Call Trace: [ 14.114235] [<c1226541>] ? panic+0x38/0xd8 [ 14.115342] [<c1005764>] ? oops_end+0x6c/0x76 [ 14.116420] [<c10190a9>] ? no_context+0x105/0x10e [ 14.120248] [<c10191f3>] ? __bad_area_nosemaphore+0x141/0x14c [ 14.121656] [<c1062823>] ? __rcu_process_callbacks+0x67/0x1fa [ 14.123439] [<c10629e9>] ? rcu_process_callbacks+0x33/0x39 [ 14.125750] [<c1030f02>] ? __do_softirq+0x110/0x147 [ 14.127760] [<c103103b>] ? irq_exit+0x36/0x58 [ 14.129666] [<c101381f>] ? smp_apic_timer_interrupt+0x6c/0x76 [ 14.131942] [<c100329a>] ? apic_timer_interrupt+0x2a/0x30 [ 14.133878] [<c1019378>] ? do_page_fault+0x0/0x270 [ 14.135045] [<c1019208>] ? bad_area_nosemaphore+0xa/0xc [ 14.136327] [<c12280d6>] ? error_code+0x66/0x70 [ 14.137427] [<e08c007b>] ? raw_cmd_done+0xc2/0x118 [floppy] [ 14.138679] [<c1019378>] ? do_page_fault+0x0/0x270 [ 14.139815] [<e08c2903>] ? floppy_ready+0x46f/0x4e3 [floppy] [ 14.141151] [<e08bf4bd>] ? debugt+0x2c/0x30 [floppy] [ 14.142318] [<c103c31f>] ? worker_thread+0x13e/0x1b6 [ 14.143520] [<e08c2d1c>] ? seek_interrupt+0x0/0xf8 [floppy] [ 14.144819] [<c103f00e>] ? autoremove_wake_function+0x0/0x2d [ 14.146202] [<c103c1e1>] ? worker_thread+0x0/0x1b6 [ 14.147327] [<c103eddb>] ? kthread+0x61/0x66 [ 14.148495] [<c103ed7a>] ? kthread+0x0/0x66 [ 14.149549] [<c10034a7>] ? kernel_thread_helper+0x7/0x10 On 2.6.31 another example: [ 2.034647] Uniform CD-ROM driver Revision: 3.20 [ 2.036041] sr 1:0:0:0: Attached scsi CD-ROM sr0 [ 2.043056] sr 1:0:0:0: Attached scsi generic sg0 type 5 [ 2.083934] md: linear personality registered for level -1 [ 2.089263] md: multipath personality registered for level -4 [ 2.094470] md: raid0 personality registered for level 0 [ 2.100858] md: raid1 personality registered for level 1 [ 2.102420] usb 1-1: new full speed USB device using uhci_hcd and address 2 [ 2.107393] xor: automatically using best checksumming function: pIII_sse [ 2.128069] pIII_sse : 9200.000 MB/sec [ 2.129289] xor: using function: pIII_sse (9200.000 MB/sec) [ 2.131952] async_tx: api initialized (sync-only) [ 2.200106] raid6: int32x1 1001 MB/s [ 2.268093] raid6: int32x2 1017 MB/s [ 2.336128] raid6: int32x4 698 MB/s [ 2.404073] raid6: int32x8 669 MB/s [ 2.472041] raid6: mmxx1 3167 MB/s [ 2.540072] raid6: mmxx2 3686 MB/s [ 2.608068] raid6: sse1x1 2875 MB/s [ 2.676077] raid6: sse1x2 3303 MB/s [ 2.744063] raid6: sse2x1 5762 MB/s [ 2.812037] raid6: sse2x2 6508 MB/s [ 2.813162] raid6: using algorithm sse2x2 (6508 MB/s) [ 2.819556] md: raid6 personality registered for level 6 [ 2.821279] md: raid5 personality registered for level 5 [ 2.822816] md: raid4 personality registered for level 4 [ 2.847176] md: raid10 personality registered for level 10 [ 2.910253] ISO 9660 Extensions: Microsoft Joliet Level 3 [ 2.914298] ISO 9660 Extensions: RRIP_1991A [ 2.923609] Registering unionfs 2.5.3 (for 2.6.31) [ 2.937957] loop: module loaded [ 2.949895] usb 1-1: New USB device found, idVendor=0627, idProduct=0001 [ 2.951870] usb 1-1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 2.956341] usb 1-1: Product: QEMU USB Tablet [ 2.957564] usb 1-1: Manufacturer: QEMU 0.12.3 [ 2.959162] usb 1-1: SerialNumber: 1 [ 2.960399] usb 1-1: configuration #1 chosen from 1 choice [ 3.011128] usbcore: registered new interface driver hiddev [ 3.038588] input: QEMU 0.12.3 QEMU USB Tablet as /class/input/input1 [ 3.041509] generic-usb 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Pointer [QEMU 0.12.3 QEMU USB Tablet] on usb-0000:00:01.2-1/input0 [ 3.046497] usbcore: registered new interface driver usbhid [ 3.048891] usbhid: v2.6:USB HID core driver [ 3.113265] squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 4.592572] floppy0: long rw: 1e instead of f [ 4.594671] rs=1 s=1 [ 4.595999] rh=0 h=0 [ 4.597317] rt=1 t=0 [ 4.598514] heads=2 eoc=0 [ 4.599891] spt=15 st=0 ss=1 [ 4.601472] in_sector_offset=0 [ 7.600014] [ 7.600677] floppy driver state [ 7.601581] ------------------- [ 7.602438] now=4294894196 last interrupt=4294894195 diff=1 last called handler=e085c053 [ 7.604380] timeout_message=floppy start [ 7.605365] last output bytes: [ 7.606212] 0 80 4294894195 [ 7.607022] 8 80 4294894195 [ 7.607832] 48 80 4294894195 [ 7.608651] f 80 4294894195 [ 7.609460] 0 80 4294894195 [ 7.610275] 8 80 4294894195 [ 7.611102] 48 80 4294894195 [ 7.611971] f 80 4294894195 [ 7.612791] 0 80 4294894195 [ 7.613600] 8 80 4294894195 [ 7.614408] 48 80 4294894195 [ 7.615224] f 80 4294894195 [ 7.616047] 0 80 4294894195 [ 7.616848] 8 80 4294894195 [ 7.617649] 48 80 4294894195 [ 7.618456] f 80 4294894195 [ 7.619267] 0 80 4294894195 [ 7.620092] 8 80 4294894195 [ 7.620903] 48 80 4294894195 [ 7.621769] f 80 4294894195 [ 7.622581] last result at 4294894195 [ 7.623581] last redo_fd_request at 4294893446 [ 7.624731] 20 1 [ 7.625595] status=80 [ 7.626305] fdc_busy=1 [ 7.627026] do_floppy=e085c053 [ 7.627856] cont=e08615dc [ 7.628637] current_req=df813d48 [ 7.629567] command_status=-1 [ 7.630464] [ 7.631041] floppy0: floppy timeout called [ 7.632137] end_request: I/O error, dev fd0, sector 2176 [ 7.634932] Buffer I/O error on device fd0, logical block 272 [ 7.636493] BUG: unable to handle kernel NULL pointer dereference at 00000004 [ 7.638225] IP: [<e0858d46>] reset_interrupt+0x41/0x45 [floppy] [ 7.639660] *pde = 00000000 [ 7.640243] Oops: 0000 [#1] SMP [ 7.640243] last sysfs file: /sys/block/loop0/size [ 7.640243] Modules linked in: vfat fat ext2 mbcache battery squashfs usbhid hid loop unionfs nls_utf8 isofs raid10 raid456 raid6_pq async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod sg sr_mod cdrom virtio_blk ata_piix pata_acpi ata_generic uhci_hcd libata virtio_pci virtio_ring virtio usbcore nls_base scsi_mod floppy thermal fan thermal_sys [last unloaded: scsi_wait_scan] [ 7.640243] [ 7.640243] Pid: 6, comm: events/0 Not tainted (2.6.31-1-586-vyatta #1) Bochs [ 7.640243] EIP: 0060:[<e0858d46>] EFLAGS: 00010246 CPU: 0 [ 7.640243] EIP is at reset_interrupt+0x41/0x45 [floppy] [ 7.640243] EAX: 00000000 EBX: c183be80 ECX: c183be84 EDX: 00000080 [ 7.640243] ESI: e08615c4 EDI: df03ffb0 EBP: e08615c8 ESP: df03ff88 [ 7.640243] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 7.640243] Process events/0 (pid: 6, ti=df03e000 task=df02d450 task.ti=df03e000) [ 7.640243] Stack: [ 7.640243] c10377cb c183be8c c183be84 e0858d05 df02d450 00000000 df02d450 c103ac36 [ 7.640243] <0> df03ffa8 df03ffa8 df02bf58 c183be80 c1037692 00000000 c103a995 00000000 [ 7.640243] <0> 00000000 00000000 df03ffd0 df03ffd0 c103a927 00000000 00000000 c10034c7 [ 7.640243] Call Trace: [ 7.640243] [<c10377cb>] ? worker_thread+0x139/0x1b6 [ 7.640243] [<e0858d05>] ? reset_interrupt+0x0/0x45 [floppy] [ 7.640243] [<c103ac36>] ? autoremove_wake_function+0x0/0x2d [ 7.640243] [<c1037692>] ? worker_thread+0x0/0x1b6 [ 7.640243] [<c103a995>] ? kthread+0x6e/0x73 [ 7.640243] [<c103a927>] ? kthread+0x0/0x73 [ 7.640243] [<c10034c7>] ? kernel_thread_helper+0x7/0x10 [ 7.640243] Code: 05 f6 80 20 2f 86 e0 04 74 1c a1 04 2f 86 e0 ff 70 08 68 d5 dc 85 e0 e8 44 32 9b e0 a1 04 2f 86 e0 ff 50 08 58 5a a1 04 2f 86 e0 <ff> 50 04 c3 e8 92 fe ff ff 83 ca ff 85 c0 78 14 25 e0 00 00 00 [ 7.640243] EIP: [<e0858d46>] reset_interrupt+0x41/0x45 [floppy] SS:ESP 0068:df03ff88 [ 7.640243] CR2: 0000000000000004 [ 7.696611] ---[ end trace 3fb89013086fe1da ]--- -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 3 Jun 2010 16:10 On Thu, 3 Jun 2010, Stephen Hemminger wrote: > > Example warning from 2.6.32 (VMware based regression system) > > [ 195.061209] floppy0: disk absent or changed during operation > [ 195.061655] end_request: I/O error, dev fd0, sector 2 > [ 195.063860] Buffer I/O error on device fd0, logical block 1 > [ 195.064650] lost page write due to I/O error on fd0 > [ 195.067456] floppy0: disk absent or changed during operation > [ 195.067468] end_request: I/O error, dev fd0, sector 2 > [ 195.068518] Buffer I/O error on device fd0, logical block 1 > [ 195.069618] lost page write due to I/O error on fd0 > [ 195.077148] ------------[ cut here ]------------ > [ 195.077647] WARNING: at fs/buffer.c:1159 mark_buffer_dirty+0x20/0x7a() So this warning doesn't say anything, just that the floppy driver failed. Whether that is due to emulation problems or not, who knows. Does it work reliably if INITIAL_JIFFIES == 0? > On 2.6.32 example (KVM with livecd + empty floppy image) and debugging > forced on: > [..] > [ 9.724066] floppy0: calling disk change from set_dor > [ 9.726342] floppy0: checking disk change line for drive 0 > [ 9.728740] floppy0: jiffies=4294894728 > [ 9.730566] floppy0: disk change line=0 > [ 9.732371] floppy0: flags=20 > [ 14.040103] end_request: I/O error, dev fd0, sector 2176 > [ 14.041418] Buffer I/O error on device fd0, logical block 272 > [ 14.042737] BUG: unable to handle kernel NULL pointer dereference at 00000040 > [ 14.044052] IP: [<e08c2903>] floppy_ready+0x46f/0x4e3 [floppy] Now _this_ is a kernel bug regardless. Oopsing is always bad. It disassembles to 0: 0f b6 05 c8 7b 8c e0 movzbl 0xe08c7bc8,%eax 7: 6b c0 34 imul $0x34,%eax,%eax a: 8d 90 e0 6d 8c e0 lea -0x1f739220(%eax),%edx 10: 3e 80 a0 e0 6d 8c e0 andb $0xdf,%ds:-0x1f739220(%eax) 17: df 18: 0f b6 05 c8 7b 8c e0 movzbl 0xe08c7bc8,%eax 1f: 8b 15 00 72 8c e0 mov 0xe08c7200,%edx 25: 6b c0 34 imul $0x34,%eax,%eax 28:* 8b 5a 40 mov 0x40(%edx),%ebx <-- trapping instruction 2b: 0f bf 80 f2 6d 8c e0 movswl -0x1f73920e(%eax),%eax 32: 39 c3 cmp %eax,%ebx 34: 74 56 je 0x8c 36: 80 3a 00 cmpb $0x0,(%edx) 39: 79 51 jns 0x8c and I think it matches up with this code movzbl current_drive, %eax # current_drive, current_drive imull $52, %eax, %eax #, current_drive, tmp253 lock; andb $-33,drive_state(%eax) #, movl raw_cmd, %eax # raw_cmd, raw_cmd.2563 movzbl current_drive, %edx # current_drive, current_drive movl 64(%eax), %ebx # <variable>.track, track imull $52, %edx, %edx #, current_drive, tmp256 movswl drive_state+18(%edx), %edx # <variable>.track, <variable>.track cmpl %edx, %ebx # <variable>.track, track [ side note that odd '%ds:' override is because it's a locked instruction that was turned into a non-locked one by turning the 'lock' prefix into a 'ds' prefix instead - it looks odd, but it's just from running a SMP kernel on a UP virtual machine ] and that in turn seems to be from check_wp() (that "lock; andb" is the "clear_bit(FD_DISK_WRITABLE_BIT, &DRS->flags);" thing). So I think we have floppy_ready() inlining seek_floppy(), and that in turn inlining check_wp(). And the actual oops seems to be just after the check_wp() function: if (raw_cmd->track != DRS->track && (raw_cmd->flags & FD_RAW_NEED_SEEK)) track = raw_cmd->track; else { setup_rw_floppy(); return; } because 'raw_cmd' is NULL. Which looks nonsensical, because we used "raw_cmd" earlier in floppy_ready(). But it does get cleared by interrupts. Possibly due to a much too early timeout, which would explain why that INITIAL_JIFFIES matters. Except > On 2.6.31 another example: [...] > [ 7.623581] last redo_fd_request at 4294893446 > [ 7.624731] 20 1 > [ 7.625595] status=80 > [ 7.626305] fdc_busy=1 > [ 7.627026] do_floppy=e085c053 > [ 7.627856] cont=e08615dc > [ 7.628637] current_req=df813d48 > [ 7.629567] command_status=-1 > [ 7.630464] > [ 7.631041] floppy0: floppy timeout called > [ 7.632137] end_request: I/O error, dev fd0, sector 2176 > [ 7.634932] Buffer I/O error on device fd0, logical block 272 > [ 7.636493] BUG: unable to handle kernel NULL pointer dereference at 00000004 > [ 7.638225] IP: [<e0858d46>] reset_interrupt+0x41/0x45 [floppy] This seems to be "cont->redo();", it decodes to f: ff 70 08 pushl 0x8(%eax) 12: 68 d5 dc 85 e0 push $0xe085dcd5 17: e8 44 32 9b e0 call 0xe09b3260 1c: a1 04 2f 86 e0 mov 0xe0862f04,%eax 21: ff 50 08 call *0x8(%eax) 24: 58 pop %eax 25: 5a pop %edx 26: a1 04 2f 86 e0 mov 0xe0862f04,%eax 2b: ff 50 04 call *0x4(%eax) 2e: c3 ret and %eax is NULL. So "cont" is NULL. And both 'cont' and 'raw_cmd' are set to NULL in the same function: unlock_fdc(). So it does seem to be that we do that too early for some reason. But it _should_ happen only if there are no more requests in redo_fd_request(). But I don't see why. I'll dig around the timer stuff a bit more. If it only happens with that INITIAL_JIFFIES thing, it must be clearly somehow related to a timer happening (or not happening) too early. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 3 Jun 2010 17:10
On Thu, 3 Jun 2010, Linus Torvalds wrote: > > But I don't see why. I'll dig around the timer stuff a bit more. If it > only happens with that INITIAL_JIFFIES thing, it must be clearly somehow > related to a timer happening (or not happening) too early. Hmm. I don't see anything. In most cases, having whatever variable pointing to jiffies in the future (which 0 would be for the INITIAL_JIFFIES case) should be fine - at most it would make the timeout way too long, not break anything. They tend to be bases for timeouts, ie you have things like fd_wait_for_completion(DRS->select_date + DP->select_delay, ... where we set a timeout based on when we did something. The one case that seems iffy for is "last_checked", which contains when we last checked the disk change state. It being in the future could cause problems, since it would make us think that a disk change state is valid when it really shouldn't be. Anyway, does this patch make any difference? It just makes floppy initialization set up some of those fields too. Untested. I don't have the hardware. Linus --- drivers/block/floppy.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index 90c4038..c7e83de 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -4180,6 +4180,8 @@ static int __init floppy_init(void) return -ENODEV; #endif + set_debugt(); + raw_cmd = NULL; for (dr = 0; dr < N_DRIVE; dr++) { @@ -4267,6 +4269,9 @@ static int __init floppy_init(void) set_bit(FD_DISK_CHANGED_BIT, &UDRS->flags); set_bit(FD_VERIFY_BIT, &UDRS->flags); UDRS->fd_device = -1; + UDRS->last_checked = jiffies; + UDRS->select_date = jiffies; + UDRS->spinup_date = jiffies; floppy_track_buffer = NULL; max_buffer_sectors = 0; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |