Looking for a PCI-X ATA 8-channel hard drive controller. [Linux Hardware]

Prev: modprobe: modprobe: Can't locate module eth
Next: Help: linux and 3940x2160 (QHD) display

From: n17ikh on 13 Dec 2006 19:03

Yes, it does seem a bit slow, doesn't it? Now that I think about it, your math may be
more correct than mine:

dd bs=1024 count=1000000 if=/dev/zero of=/mnt/md0/test.img
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 138.542 s, 7.4 MB/s

Ran it twice, got 9MB/sec the first time but then realized I left vmware running under
heavy load (it accesses a disk image on the array) and stopped it. Got slower results. Go
figure.

During the write the md0_raid6 thread averages about 5 percent cpu usage, dd averages
around 15 percent, pdflush averages 7 percent, and kjournald averages 10 percent. This
being in top and this being a dual CPU system, these percentages are actually out of 200.
The hardware is an MSI K7D motherboard with 1GB of (rather generic) RAM and dual Athlon
XP 1700+s running at stock speed. The HDDs are el cheapo Western Digitals of varying
model numbers, all 250GB with 2mb cache. (That's why I have so many: they're cheap.)
However, all the drives will do around ~55MB/sec reading and writing individually, IIRC.

dd if=/mnt/md0/test.img of=/dev/null
2000000+0 records in
2000000+0 records out
1024000000 bytes (1.0 GB) copied, 16.2642 s, 63.0 MB/s

During the read, something else is going on: dd is pulling in 54 percent CPU usage and
md0_raid6 is pulling 12 percent! Not what I wanted to see, however, 63MB/sec is a good
deal faster than what I remember it testing at last.
Here's some lines from dstat (equivalent to iostat plus some other stuff) for you while I
was doing the read (not the one above, but a second one):

dstat
----total-cpu-usage---- -disk/total -net/total- ---paging-- ---system--
usr sys idl wai hiq siq|_read write|_recv _send|__in_ _out_|_int_ _csw_
1 8 85 5 0 0| 808k 726k| 0 0 | 16B 5.6B| 753 3534
5 76 0 8 8 2| 85M 0 | 23k 644k| 0 0 |3021 5158
4 81 0 8 5 2| 87M 0 | 25k 645k| 0 0 |3067 5224
3 63 3 25 2 3| 66M 208k| 22k 541k| 0 0 |2944 5669
3 74 0 18 4 1| 73M 80k| 25k 661k| 0 0 |2993 5438
2 83 0 9 4 1| 82M 0 | 24k 661k| 0 0 |3001 5170
2 76 0 11 5 5| 75M 16k| 23k 574k| 0 0 |2926 5285
5 73 0 18 3 1| 57M 0 | 19k 560k| 0 0 |2816 5278

(That's with one of the CPUs fully taken up with VMware, I ran the read test again while
doing some other things when I realized I forgot to include this.)

Just so I'm not getting false results, I ran another test, this time with an actual file:

dd if=VTS_01_1.VOB of=/dev/null
2097148+0 records in
2097148+0 records out
1073739776 bytes (1.1 GB) copied, 23.8662 s, 45.0 MB/s

Once again, 50 percent cpu usage by dd and 20 percent by md0. Nothing it can't handle,
though. Also, these read speeds are acceptable! As opposed to what I was getting when I
tested this a long time ago. Not so much the write speed though.

The loads the system is typically under is usually bunch of small nonsequential writes
(think bittorrent, with 700 files open - probably analogous to decent database
applications), with the other reads and writes being the sort that max out network
bandwidth (just uploading/accessing big honking files), or streaming files over the
network - maybe 3mbit for videos and such. It also acts as an FTP server with maybe
3MB/sec worth of users on it downloading or uploading at any one time.

Anyway, I have tried putting my IDE cards in the PCI-X slots - however, the results were
less than promising. I got nothing but DMA timeouts and the disks not working at all.
Supposedly, these Highpoints (using the HPT302 chipset and driver) can run in 32
bit/66MHz mode in a PCI-X slot, but as far as I can tell, they just take a massive dump
when in that mode. Maybe something's wrong with my system configuration.

Last time I looked in SMART (not that long ago), the drives were all more or less
healthy. The filesystem itself may have some problems though, it's nearly full and
hasn't had any chance to clear up any fragmentation in a long time (it hasn't been
unmounted in months). That may not explain much though. All the drives are in UDMA
mode, too, by the way.

Here is my /proc/interrupts:

cat /proc/interrupts
CPU0 CPU1
0: 706952388 1079 IO-APIC-edge timer
4: 16495247 1 IO-APIC-edge serial
7: 0 0 IO-APIC-edge parport0
8: 2943013666 1 IO-APIC-edge rtc
9: 0 0 IO-APIC-level acpi
14: 52370981 12 IO-APIC-edge ide0
15: 3111546 12 IO-APIC-edge ide1
16: 1470092348 17 IO-APIC-level ide4, ide5, ohci_hcd:usb3, eth0
17: 135039554 16 IO-APIC-level ide2, ide3, ohci_hcd:usb2
18: 504 1 IO-APIC-level ehci_hcd:usb1
19: 42474 78 IO-APIC-level ohci_hcd:usb4
NMI: 0 0
LOC: 706911378 706911377
ERR: 0
MIS: 0

I don't really know how to interpret this, but I assume it means IDE2 and 3 (on the same
controller card) share an interrupt with a port of the USB 2.0 card I have in there, and
IDE4 and 5 share an interrupt with another port of the USB card and the built-in 100mbit
ethernet. There's no gigabit ethernet in this server (yet.)

Hope that helps you help me. Heh. Also, it appears I spoke too soon about the 3ware PCI-
X cards: There are a ton of 7500-8s and 7506-8s up suddenly on ebay, the stock seems to
fluctuate wildly at times. What people are willing to pay for the things is pretty
amazing too.

I'm still fairly convinced one of (the only?) main bottlenecks is the PCI bus though, and
the fact that I'm running these 8 drives on 4 IDE channels (!!).

Thanks and sorry for you having to deal with this schizophrenic post that I wrote parts
of at a time.

-n17ikh

Frantisek.Rysanek(a)post.cz wrote in news:1165519537.727958.14110@
80g2000cwy.googlegroups.com:

> 14 Megs per second? That seems too slow, even for plain old PCI
> (32bits(a)33MHz).
>
> My experience is that this plain PCI bus on reasonably modern chipsets
> (i845 and above) throttles at about 100 MBps. I've measured that with
> sequential transfers to various external RAID units, capable of 150-220
> MBps, via an U320 HBA.
>
> If you have 9 drives on a single PCI bus, their total bandwidth should
> still be around 100 MBps. In RAID 6, two drives are parity overhead.
> That's some 20 per cent off your total bandwidth. Thus, I'd expect
> about 70-80 MBps under easy sequential load, such as
> cp /dev/zero /dev/md0
> or
> cp /dev/md0 /dev/null
>
> You're right that if this RAID machine serves files via a Gb Eth NIC,
> you'll get half of the bandwidth eaten by the NIC.
>
> What CPU do you have? You seem to say that you can choose between PCI
> and PCI-X in your system - that doesn't look like completely feeble
> hardware. Makes me think that your PC has a Pentium 4 in an entry-level
> server chipset, such as the i875 or i7210, combined with a 6300ESB (PCI
> and PCI-X). Or it could be an older Pentium 3 server with a ServerWorks
> chipset... Either way, you shouldn't be starved of CPU horsepower for
> the RAID operations (XOR, Reed Solomon) - how much estimated throughput
> does "md" report at boot, for the algorithm that it selects? (see
> dmesg)
>
> Have you tried putting one of your PCI IDE controllers into a PCI-X
> slot? You *can* put a 32bit PCI board into a 64bit slot and it should
> work, provided that the 5V/3.3V compatibility keys in the slot and on
> the board are in a mutually permitting constellation... (You can even
> put a 64bit board into a 32bit slot, for that matter - not your case,
> though.)
>
> If you can put each IDE controller into a separate PCI segment, that
> should double your bandwidth to the disk drives. Or you could keep the
> two IDE controllers together on one bus, and use the other bus for the
> NIC, if you don't have an Ethernet PHY integrated in your south-bridge,
> or attached via some proprietary Ethernet-only link from the
> south-bridge...
>
> Also note that if you have a 6300ESB, the "Hub Link" between s.b. and
> n.b. is only capable of 266 MBps, I don't remember whether this is full
> or half duplex. So the segment of PCI64(a)66 off the south bridge,
> nominally capable of 533 MBps (half duplex), can be throttled by the
> HubLink.
>
> Back to your weird symptoms. 14 MBps is *really* slow, regardless of
> your chipset and CPU, unless it's a Pentium-class machine.
>
> What sort of load do you have? Large files? A myriad small files? A
> database? Sequential transfers? Small random rewrites? Note that RAID 5
> and RAID 6 act like "poisoned" when asked to do a lot of tiny write
> transactions that trash the cache. On every such write, the RAID first
> has to read the whole corresponding stripe set, calculate the parity
> stripes (two of them for RAID 6) and finally write the payload stripe
> and the parity stripes.
>
> None of this happens when reading (from a healthy array). No parity
> calculation. Close to RAID0 performance.
>
> Let me suggest an easy test of sequential reading performance:
> cp /dev/md0 /dev/null
> and, on another console,
> iostat 2
> The iostat util is a part of the "sysstat" package and shows transfers
> in the unit of sectors, i.e. divide the figure in "blocks per second"
> by two and you get a transfer rate in kBps...
>
> Do your disk drives run in UDMA mode? You should be able to get to know
> using
> hdparm -I /dev/drive
> Some IDE drivers also report that on boot (see dmesg).
>
> Are all your disk drives healthy? There are several possible clues.
>
> Firstly, if you try a sequential reading test (see above) on a single
> disk drive, with iostat measuring your throughput, does the transfer
> rate fluctuate? It should not, the drive should read at a fairly stable
> pace. If the transfer rate drops now and then almost to zero, and then
> goes back to normal, that's a sign that your drive has weaker areas, or
> that it's got "transparently remapped" sectors and has to seek during
> linear reading to reach for them. Try holding a suspicious disk drive
> in your hand - if you feel sudden seeking now and then (while the
> transfer rate drops), you know what the problem is.
>
> Secondly, download and compile smartmontools. You need 'smartctl'. Try
> smartctl -a /dev/hda
> Either you get some three pages of data, or the drive complains about
> unknown command (DriveReady SeekComplete ...). This probably means that
> S.M.A.R.T. is off. Turn it on and try again:
> smartctl -s on /dev/hda
> smartctl -a /dev/hda
> Focus on the SMART error log. If smartctl says that there are no errors
> in the SMART error log, that may or may not mean that the drive is OK.
> I believe the SMART error log 'sector' is stored on the disk itself,
> and some drives seem to be flawed to the point that they can't even log
> their own errors... If there *are* some errors in the SMART log, that
> should be enough evidence for an RMA of the disk drive.
>
> Do your IDE controllers share IRQ's, with each other or with other
> devices? If your machine has an APIC, is it enabled? (Provides
> additional IRQ lines, decreases the order of IRQ sharing.)
> If unsure, post a listing of
> cat /proc/interrupts
>
> It's also theoretically possible that your quad IDE controllers are
> flawed in their design to the point that their performance is impaired,
> but I've never actually met any such hardware... The driver would have
> to use polling IO to achieve transfer rates this low :-)
>
> Frank Rysanek
>
>
> n17ikh(a)gmail.com napsal:
>> Yes actually, I *do* have a hoard of PATA drives (got a good deal).
>> The reason I'm looking to upgrade to a PCI-X solution is because
>> right now I'm using two 4-drive Highpoint cards that are plain PCI,
>> and with that many drives in RAID-6 the bottleneck in both reading
>> and writing is the PCI bus itself, e.g. when it writes it needs to
>> write to all 9 drives (one is on the onboard controller) at once,
>> and since PCI has a bandwidth of 133MB/sec I get 133/9 MB/sec, or
>> around 14 MB/sec. In practice it's even less, because usually that
>> data comes from the network, which also uses the PCI bus, so I get a
>> half or a third of that. Practically unacceptable, but when you're
>> on a budget, what can you do, eh? Using the PCI-X slots on my board
>> should solve or at least alleviate that problem.
>> -n17ikh
>> --
>> * Posted with NewsLeecher v3.0 Beta 7
>> * http://www.newsleecher.com/?usenet
>
>

From: General Schvantzkoph on 13 Dec 2006 20:14

On Tue, 05 Dec 2006 01:22:31 -0600, n17ikh(a)gmail.com wrote:

> I realize this is fairly specific, which is why google has failed me
> and I come to Usenet. I'm looking for a (obviously linux-
> compatible) ATA hard drive controller (RAID is OK, but won't be
> used, I use MD RAID). It needs to be PCI-X OR PCI-64/66 and have
> either 4 channels with the capability for two drives per channel, or
> more preferably, 8 channels with single-drive-per-channel
> capability. I've found one in the 3ware 7506-8, but all those have
> mysteriously vanished off the face of ebay. If someone knows where
> I can find one of those used or the model number/manufacturer of
> something else I could use, I'd be eternally grateful. Also
> preferably, I'd like to be able to find something under 200USD or
> even 150USD. This was something that I could have done with the
> 7506-8s but unfortunately they're gone, apparently for good.
> Anyways, if anyone knows a good alternative, I will love you
> forever. In a good way.
>
> TIA
> -n17ikh

http://www.newegg.com/Product/Product.asp?Item=N82E16816116020

From: n17ikh on 13 Dec 2006 21:29

General Schvantzkoph <schvantzkoph(a)yahoo.com> wrote in
news:4ubmroF150t5rU22(a)mid.individual.net:

> On Tue, 05 Dec 2006 01:22:31 -0600, n17ikh(a)gmail.com wrote:
>
>> I realize this is fairly specific, which is why google has failed me
>> and I come to Usenet. I'm looking for a (obviously linux-
>> compatible) ATA hard drive controller (RAID is OK, but won't be
>> used, I use MD RAID). It needs to be PCI-X OR PCI-64/66 and have
>> either 4 channels with the capability for two drives per channel, or
>> more preferably, 8 channels with single-drive-per-channel
>> capability. I've found one in the 3ware 7506-8, but all those have
>> mysteriously vanished off the face of ebay. If someone knows where
>> I can find one of those used or the model number/manufacturer of
>> something else I could use, I'd be eternally grateful. Also
>> preferably, I'd like to be able to find something under 200USD or
>> even 150USD. This was something that I could have done with the
>> 7506-8s but unfortunately they're gone, apparently for good.
>> Anyways, if anyone knows a good alternative, I will love you
>> forever. In a good way.
>>
>> TIA
>> -n17ikh
>
> http://www.newegg.com/Product/Product.asp?Item=N82E16816116020
>

You missed one of my criteria, which was that it cost under $200 USD. I
unfortunately don't have $400 to spend on a *new* 7506-8, as much as I
would like to have one. :(
-n17ikh

From: Frantisek.Rysanek on 22 Dec 2006 13:10

Sorry for taking such a long time to respond...
Thanks for posting your detailed data, unfortunately I really can't
suggest much more.

> dd if=/mnt/md0/test.img of=/dev/null
> 2000000+0 records in
> 2000000+0 records out
> 1024000000 bytes (1.0 GB) copied, 16.2642 s, 63.0 MB/s
>
> During the read, something else is going on: dd is pulling in 54 percent CPU usage and
> md0_raid6 is pulling 12 percent! Not what I wanted to see, however, 63MB/sec is a good
> deal faster than what I remember it testing at last.
>
try using "cp" instead of "dd" where possible, or tell "dd" to use a
larger block size than the default 512 B. Your system could be stalled
by the huge number of transactions per second that "dd" generates. The
generic disk IO subsystem can only handle so many IOps, and the RAID5/6
is especially hurt by small transactions (unless it can combine them
into larger chunks).

Cp uses some reasonable transaction size for its sequential copying
activity. Essentially "cp" is only any use in the reading test, as you
need a file of some fixed length on the source side (/dev/zero would
let your cp run forever).
Again, try using "iostat 2" (from the sysstat package) on another
console to watch throughput.

Alternatively, try Bonnie or Bonnie++. It does several different tests,
tries to measure your IOps as well as sequential MBps, for reading and
writing. You are only interested in the "sequential" numbers - all the
other measurements put quite some load on the CPU. I believe Bonnie
uses some sane (big) transaction size for the sequential writing tests.

> Anyway, I have tried putting my IDE cards in the PCI-X slots - however, the results were
> less than promising. I got nothing but DMA timeouts and the disks not working at all.
>
Wow. Something's seriously broken in that setup.
Makes me wonder if perhaps the IRQ assignments could go completely
wrong, with ACPI being a possible culprit. You can try fiddling with a
couple things:
- disable ACPI in your BIOS but leave MPS on
- add "pci=noacpi" or "acpi=off" or "pci=routeirq" at the Linux
bootloader prompt
- try upgrading your motherboard's BIOS

> Supposedly, these Highpoints (using the HPT302 chipset and driver) can run in 32
> bit/66MHz mode in a PCI-X slot, but as far as I can tell, they just take a massive dump
> when in that mode. Maybe something's wrong with my system configuration.
>
Maybe the Highpoint hardware really just doesn't work in 66MHz mode, or
doesn't like your PCI bridge in that mode. Some motherboards can be
forced into 33 MHz mode (by a jumper or by a BIOS option), most cannot.
I've seen such a jumper od some Advantech "industrial" ATX
motherboards, and I've seen relevant PCI clocking options on some
recent SuperMicro motherboards for Xeon CPU's (i7520 and i5000 series
chipsets).

> Last time I looked in SMART (not that long ago), the drives were all more or less
> healthy.
>
good.

> cat /proc/interrupts
> CPU0 CPU1
> 0: 706952388 1079 IO-APIC-edge timer
> 4: 16495247 1 IO-APIC-edge serial
> 7: 0 0 IO-APIC-edge parport0
> 8: 2943013666 1 IO-APIC-edge rtc
> 9: 0 0 IO-APIC-level acpi
> 14: 52370981 12 IO-APIC-edge ide0
> 15: 3111546 12 IO-APIC-edge ide1
> 16: 1470092348 17 IO-APIC-level ide4, ide5, ohci_hcd:usb3, eth0
> 17: 135039554 16 IO-APIC-level ide2, ide3, ohci_hcd:usb2
> 18: 504 1 IO-APIC-level ehci_hcd:usb1
> 19: 42474 78 IO-APIC-level ohci_hcd:usb4
> NMI: 0 0
> LOC: 706911378 706911377
> ERR: 0
> MIS: 0
>
> I don't really know how to interpret this, but I assume it means IDE2 and 3 (on the same
> controller card) share an interrupt with a port of the USB 2.0 card I have in there, and
> IDE4 and 5 share an interrupt with another port of the USB card and the built-in 100mbit
> ethernet. There's no gigabit ethernet in this server (yet.)
>
Exactly.
If you can live without the USB ports, turn them off in the BIOS.
If you cannot, that's your choice.
I don't think this would help you much though.

It's good to know that you have the APIC enabled.
It seems that your south bridge (APIC) has got only 4 PCI IRQ lines,
and these are shared by all the PCI devices, internal and external.
Modern servers usually have a separate set of IRQ's for each PCI bus
segment.
Maybe you'd see additional IRQ numbers allocated if you managed to put
some devices on the PCI-X segment.

> I'm still fairly convinced one of (the only?) main bottlenecks is the PCI bus though, and
> the fact that I'm running these 8 drives on 4 IDE channels (!!).
>
Yes, these two factors combined could explain 60 MBps on a 132 MBps
bus.

On a slightly unrelated note, would you be willing to post a dump of
lspci
and
lspci -vt
?
It's not that it would help me tune your performance, but at least I'd
know what sort of a chipset you're using.

Frank Rysanek

First | Prev |
Pages: 1 2
Prev: modprobe: modprobe: Can't locate module eth
Next: Help: linux and 3940x2160 (QHD) display