Prev: kernel/kfifo.c: fix "interger as NULL pointer" warning.
Next: mm/memcontrol.c: fix "integer as NULL pointer" warning.
From: Ing. Daniel Rozsnyó on 24 Jan 2010 14:00 Hello, I am having troubles with nested RAID - when one array is added to the other, the "bio too big device md0" messages are appearing: bio too big device md0 (144 > 8) bio too big device md0 (248 > 8) bio too big device md0 (32 > 8) From internet searches I've found no solution or error like mine, just a note about data corruption when this is happening. Description: My setup is the following - one 2TB and four 500GB drives. The goal is to have a mirror of the 2TB drive to a linear array of the other four drives. So.. the state without the error above is this: # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md1 : active linear sdb1[0] sde1[3] sdd1[2] sdc1[1] 1953535988 blocks super 1.1 0k rounding md0 : active raid1 sda2[0] 1953447680 blocks [2/1] [U_] bitmap: 233/233 pages [932KB], 4096KB chunk unused devices: <none> With these block request sizes: # cat /sys/block/md{0,1}/queue/max_{,hw_}sectors_kb 127 127 127 127 Now, I add the four drive array to the mirror - and the system starts showing the bio error at any significant disk activity.. (probably writes only). The reboot/shutdown process is full of these errors. The step which messes up the system (ignore re-added, it happened the very first time I've constructed the 4 drive array a hour ago): # mdadm /dev/md0 --add /dev/md1 mdadm: re-added /dev/md1 # cat /sys/block/md{0,1}/queue/max_{,hw_}sectors_kb 4 4 127 127 The dmesg is just showing this: md: bind<md1> RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:sda2 disk 1, wo:1, o:1, dev:md1 md: recovery of RAID array md0 md: minimum _guaranteed_ speed: 1000 KB/sec/disk. md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. md: using 128k window, over a total of 1953447680 blocks. And as soon as a write occures to the array: bio too big device md0 (40 > 8) The removal of md1 from md0 does not help the situation, I need to reboot the machine. The md0 array bears LVM and inside it a root / swap / portage / distfiles and home logical volumes. My system is: # uname -a Linux desktop 2.6.32-gentoo-r1 #2 SMP PREEMPT Sun Jan 24 12:06:13 CET 2010 i686 Intel(R) Xeon(R) CPU X3220 @ 2.40GHz GenuineIntel GNU/Linux Thanks for any help, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Marti Raudsepp on 25 Jan 2010 10:30 2010/1/24 "Ing. Daniel Rozsny�" <daniel(a)rozsnyo.com>: > Hello, > �I am having troubles with nested RAID - when one array is added to the > other, the "bio too big device md0" messages are appearing: > > bio too big device md0 (144 > 8) > bio too big device md0 (248 > 8) > bio too big device md0 (32 > 8) I *think* this is the same bug that I hit years ago when mixing different disks and 'pvmove' It's a design flaw in the DM/MD frameworks; see comment #3 from Milan Broz: http://bugzilla.kernel.org/show_bug.cgi?id=9401#c3 Regards, Marti -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Milan Broz on 25 Jan 2010 13:30 On 01/25/2010 04:25 PM, Marti Raudsepp wrote: > 2010/1/24 "Ing. Daniel Rozsny�" <daniel(a)rozsnyo.com>: >> Hello, >> I am having troubles with nested RAID - when one array is added to the >> other, the "bio too big device md0" messages are appearing: >> >> bio too big device md0 (144 > 8) >> bio too big device md0 (248 > 8) >> bio too big device md0 (32 > 8) > > I *think* this is the same bug that I hit years ago when mixing > different disks and 'pvmove' > > It's a design flaw in the DM/MD frameworks; see comment #3 from Milan Broz: > http://bugzilla.kernel.org/show_bug.cgi?id=9401#c3 Hm. I don't think it is the same problem, you are only adding device to md array... (adding cc: Neil, this seems to me like MD bug). (original report for reference is here http://lkml.org/lkml/2010/1/24/60 ) Milan -- mbroz(a)redhat.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Neil Brown on 27 Jan 2010 21:30 On Mon, 25 Jan 2010 19:27:53 +0100 Milan Broz <mbroz(a)redhat.com> wrote: > On 01/25/2010 04:25 PM, Marti Raudsepp wrote: > > 2010/1/24 "Ing. Daniel Rozsnyó" <daniel(a)rozsnyo.com>: > >> Hello, > >> I am having troubles with nested RAID - when one array is added to the > >> other, the "bio too big device md0" messages are appearing: > >> > >> bio too big device md0 (144 > 8) > >> bio too big device md0 (248 > 8) > >> bio too big device md0 (32 > 8) > > > > I *think* this is the same bug that I hit years ago when mixing > > different disks and 'pvmove' > > > > It's a design flaw in the DM/MD frameworks; see comment #3 from Milan Broz: > > http://bugzilla.kernel.org/show_bug.cgi?id=9401#c3 > > Hm. I don't think it is the same problem, you are only adding device to md array... > (adding cc: Neil, this seems to me like MD bug). > > (original report for reference is here http://lkml.org/lkml/2010/1/24/60 ) No, I think it is the same problem. When you have a stack of devices, the top level client needs to know the maximum restrictions imposed by lower level devices to ensure it doesn't violate them. However there is no mechanism for a device to report that its restrictions have changed. So when md0 gains a linear leg and so needs to reduce the max size for requests, there is no way to tell DM, so DM doesn't know. And as the filesystem only asks DM for restrictions, it never finds out about the new restrictions. This should be fixed by having the filesystem not care about restrictions, and the lower levels just split requests as needed, but that just hasn't happened.... If you completely assemble md0 before activating the LVM stuff on top of it, this should work. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ing. Daniel Rozsnyó on 28 Jan 2010 04:30
Neil Brown wrote: > On Mon, 25 Jan 2010 19:27:53 +0100 > Milan Broz <mbroz(a)redhat.com> wrote: > >> On 01/25/2010 04:25 PM, Marti Raudsepp wrote: >>> 2010/1/24 "Ing. Daniel Rozsnyó" <daniel(a)rozsnyo.com>: >>>> Hello, >>>> I am having troubles with nested RAID - when one array is added to the >>>> other, the "bio too big device md0" messages are appearing: >>>> >>>> bio too big device md0 (144 > 8) >>>> bio too big device md0 (248 > 8) >>>> bio too big device md0 (32 > 8) >>> I *think* this is the same bug that I hit years ago when mixing >>> different disks and 'pvmove' >>> >>> It's a design flaw in the DM/MD frameworks; see comment #3 from Milan Broz: >>> http://bugzilla.kernel.org/show_bug.cgi?id=9401#c3 >> Hm. I don't think it is the same problem, you are only adding device to md array... >> (adding cc: Neil, this seems to me like MD bug). >> >> (original report for reference is here http://lkml.org/lkml/2010/1/24/60 ) > > No, I think it is the same problem. > > When you have a stack of devices, the top level client needs to know the > maximum restrictions imposed by lower level devices to ensure it doesn't > violate them. > However there is no mechanism for a device to report that its restrictions > have changed. > So when md0 gains a linear leg and so needs to reduce the max size for > requests, there is no way to tell DM, so DM doesn't know. And as the > filesystem only asks DM for restrictions, it never finds out about the > new restrictions. Neil, why does it even reduce its block size? I've tried with both "linear" and "raid0" (as they are the only way to get 2T from 4x500G) and both behave the same (sda has 512, md0 127, linear 127 and raid0 has 512 kb block size). I do not see the mechanism how 512:127 or 512:512 leads to 4 kb limit Is it because: - of rebuilding the array? - of non-multiplicative max block size - of non-multiplicative total device size - of nesting? - of some other fallback to 1 page? I ask because I can not believe that a pre-assembled nested stack would result in 4kb max limit. But I haven't tried yet (e.g. from a live cd). The block device should not do this kind of "magic", unless the higher layers support it. Which one has proper support then? - standard partition table? - LVM? - filesystem drivers? > This should be fixed by having the filesystem not care about restrictions, > and the lower levels just split requests as needed, but that just hasn't > happened.... > > If you completely assemble md0 before activating the LVM stuff on top of it, > this should work. > > NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |