Prev: linux-next: build failure after merge of the scsi-post-merge final tree
Next: [PATCH] scripts/kernel-doc: fix empty function description section
From: Karel Zak on 9 Mar 2010 07:30 On Tue, Mar 09, 2010 at 01:16:01PM +0300, Michael Tokarev wrote: > Karel Zak wrote: > > # mdadm --create /dev/md8 --level=5 --raid-devices=4 /dev/sdb{1,2,3,4} > > That's 3-disk stripe size with default 64Kb chunk size, which makes > 3x64=320KiB - the number to which everything should be aligned. > > > # fdisk -lcu /dev/md8 > > > > Disk /dev/md8: 1572 MB, 1572667392 bytes > > 2 heads, 4 sectors/track, 383952 cylinders, total 3071616 sectors > > Units = sectors of 1 * 512 = 512 bytes > > Sector size (logical/physical): 512 bytes / 4096 bytes > > I/O size (minimum/optimal): 65536 bytes / 65536 bytes > > And here we go: fdisk does not see the right number: nothing > is dividable by 3. Well, the same setup with 2.6.34-0.9.rc0.git13.fc14.x86_64: # fdisk -luc /dev/sdb Disk /dev/sdb: 2621 MB, 2621440000 bytes 255 heads, 63 sectors/track, 318 cylinders, total 5120000 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 32768 bytes Disk identifier: 0x77fbab55 Device Boot Start End Blocks Id System /dev/sdb1 2048 1026047 512000 83 Linux /dev/sdb2 1026048 2050047 512000 83 Linux /dev/sdb3 2050048 3074047 512000 83 Linux /dev/sdb4 3074048 4098047 512000 83 Linux # mdadm --create /dev/md8 --level=5 --raid-devices=4 /dev/sdb{1,2,3,4} # fdisk -luc /dev/md8 Disk /dev/md8: 1572 MB, 1572667392 bytes 2 heads, 4 sectors/track, 383952 cylinders, total 3071616 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 65536 bytes / 65536 bytes # cat /sys/block/md8/queue/{minimum,optimal}_io_size 65536 65536 > > # cat /sys/block/md8/md8p{1,2}/alignment_offset > > 0 > > 0 > > And that's where the issue is. md does not {sup,re}port all > this stuff yet. Hmm... Karel -- Karel Zak <kzak(a)redhat.com> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Dave Chinner on 9 Mar 2010 07:30 On Tue, Mar 09, 2010 at 02:38:57PM +0300, Michael Tokarev wrote: > Dave Chinner wrote: > > On Tue, Mar 09, 2010 at 01:16:01PM +0300, Michael Tokarev wrote: > >> Karel Zak wrote: > >>> I did almost all my tests with scsi_debug or MD RAID0 on scsi_debug. > >>> It works as expected. > >> Actually, for raid0, the alignment is questionable. Should it be a > >> multiple of chunk size or whole stripe size? I'm not sure, both ways > >> has bad and good sides.. But if it is the latter, the same issues > >> pops up again: do a 3-disk raid0 and you'll have to align to 3*2^N. > > > > Yes, alignment is still needed, especially for filesystems that can > > do stripe unit aligned allocation like XFS. If you don't align the > > filesystem properly, all the data IO will be mis-aligned to the > > underlying disks and stripe unit sized IO will hit multiple disks > > rather than just one.... > > I understand alignment is needed, the question is if the alignment > should be to chunk size or full-stripe size. In neither case it > will be bad for underlying disks. Depends on the RAID implementation. High end RAID arrays often have cache bypass features that are triggered by stripe width aligned and sized IOs. cwWhen receiving well formed IO they can more than double write performance because they are not limited by internal cache mirroring bandwidth (e.g. the controller magically switches to write-through for those well formed IOs instead of writeback). So from that perspective, alignment needs to be to stripe width, not stripe unit. Similarly for RAID5/6 alignment needs to be to stripe width, so that a well formed IO issued by the filesystem only hits one RAID5/6 stripe. FWIW, XFS takes great care to ensure that it doesn't place all it's allocation group headers on the same stripe unit. Failing to distribute the AG headers across all the ѕtripe units evenly loads the disks/luns in the stripe unevenly. As soon as you have uneven load on a stripe the performance tanks as stripe is only as fast as it's slowest member. Also, while XFS prefers to align to stripe unit, there are mount options to change the default allocation alignment to be stripe width based. Hence if you have large files and applications that are doing well formed IO, stripe width alignment of the filesystem to the underlying block device is critical to acheiving deterministic throughput close to the maximum the hardware can support..... Cheers, Dave. -- Dave Chinner david(a)fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Mark Lord on 9 Mar 2010 09:00 On 03/07/10 22:48, Tejun Heo wrote: ... > Please note that hdparm is misreporting the alignment offset. It > should be reporting 512 instead of 256 for offset-by-one drives. ... That issue was fixed quite a while ago. Upgrade your elderly copy of hdparm. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Daniel Taylor on 9 Mar 2010 17:40 hpa> I would very much like a reference for a platform which has hpa> firmware which can successfully boot from 4K-logical media. It hpa> would be very useful for bootloader testing. I am told that the Mac UEFI platform will boot from 4K logical/physical drives. Now I have to scrounge one of the old drives to test it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Greg Freemyer on 9 Mar 2010 17:50
<snip> > > As far as partitioning... I believe we should be using GPT partition tables > where possible. �Even on non-EFI systems, it's simply a much better > partition table format. > > � � � �-hpa GPT can not be used for boot disks in non-EFI systems, right? Greg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |