swap tuning [Setup]

Prev: always create the locations that need frequent I/O /home, swap on the outer tracks ?
Next: Clarifications needed to configure NFS client

From: annalissa on 20 Jul 2010 06:12

Hi all,

The following is what i have read in a magazine named "linux for
you" , To what extent this is true ?

ideally dedicate a set of high performance disks spread across two or
more controllers .

if swap space resides on a busy disk, then to reduce latency, it
should be located as close to a busy partition
like (/var) as possible to reduce seek time for the drive head

while using different partitions or hard disks of different speeds for
swap, you can assign priorities to each partition so that the kernel
will use higher priority hard disk first.

in addition , the kernel will distribute visit counts in a round robin
fashion across all devices with equal priorities

Ex:-
/dev/sda6 /swap swap pri=4 0 0
/dev/sda8 /swap swap pri=4 0 0
/dev/sdb3 /swap swap pri=1 0 0

From: Aragorn on 20 Jul 2010 07:01

On Tuesday 20 July 2010 12:12 in comp.os.linux.setup, somebody
identifying as annalissa wrote...

> Hi all,
>
> The following is what i have read in a magazine named "linux for
> you" , To what extent this is true ?
>
>
>
> ideally dedicate a set of high performance disks spread across two or
> more controllers .

This in itself is a rather vague description. I presume that the above
suggest the use of a RAID solution, although this need not necessarily
be the case. But let's handle RAID first before we get into the other
aspects...

When using a RAID solution, then if you are using hard disks of the
IDE/PATA variety, you will typically be using software RAID - true
hardware PATA RAID controllers do exist, but they are rare and few.
With such a software RAID set-up, it is generally advised to use
separate disk controllers, due to the limitations of the throughput on
PATA - i.e. 133 MB/sec for the entire disk controller with UDMA
enabled. So in other words, if you have multiple disks connected to
the same PATA controller, then the controller will be the bottleneck.

The same is true for parallel SCSI. I haven't exactly followed the
latest developments in parallel SCSI anymore since I have switched to
SAS ("Serial-Attached Scsi") for my own SCSI implementations, but for
as far as I know, the fastest parallel SCSI standard at the moment is
Ultra 320 - it is possible that there's already Ultra 640 by now;
again, I have not been following the evolution on that anymore - and
that means that every SCSI channel has a maximum throughput of 320 MB
per second.

However, for SAS, SATA and Firewire, things are different. These types
of disks are connected via a point-to-point connection, and so the
controller itself does not form a bottleneck anymore, except in the
event that the controller has a higher throughput capacity than its
PCI, PCI-X or PCIe bus allows, but then it is the bus that forms the
bottleneck, not the controller.

However, as explained by several people in reply to an earlier post from
you on this subject, an important thing to keep into account is
caching. Not only do the disks have a cache, but in the event of a
hardware RAID solution, the RAID controller will also have a cache, and
on top of all that, the Linux kernel also caches and buffers, and on
some filesystem types more than on others. (XFS and reiser4 for
instance are aggressively caching filesystem types.)

A lot also depends on the type of RAID that you choose to use. RAID 0
(striping) is the ultimate throughput solution, but RAID 0 is not
redundant. RAID 1 (mirroring) on the other hand offers RAID 0
performance during reads, but not during writes, as the entire data
cache has to be written to both disks - assuming we are talking of two
disks here - in its entirety when the kernel flushes the data to the
disk. During reads, the RAID controller - or the software RAID code in
the Linux kernel, depending on your set-up - will behave somewhat like
RAID 0 in that part of the data will be read from one disk and part
form the other disk.

Now, the above all said, if you do not have a RAID set-up, but just a
bunch of hard disks in your controller - and by this I do not mean a
JBOD set-up, but just the regular scenario where a person has multiple
hard disks in their machine - then you /may/ theoretically gain a bit
in performance if you spread the filesystems with the highest I/O
demands across multiple disks, e.g. "/var" on one disk and "/usr" on
another disk. But as explained in the replies to your earlier inquiry
about this subject and as I have mentioned myself higher up, this is
all theoretical because you have to keep caching into account.

However, when you have two hard disks in your computer and your machine
does not have a lot of RAM in it - and thus in other words: if your
machine has to use swap a lot - then you can set up a swap partition on
each disk and give them equal priority, in which case the kernel will
swap the data to each swap partition in an alternating way. Or you can
set a higher priority for one of the swap partitions - e.g. if one of
the two disks is faster than the other.

> if swap space resides on a busy disk, then to reduce latency, it
> should be located as close to a busy partition like (/var) as possible
> to reduce seek time for the drive head

Hmm... I think that's rather far-fetched. The Linux kernel has a
couple of very good I/O schedulers, e.g. cfq ("completely fair
queueing") and the anticipatory scheduler. The latter was the default
for a long time and still may be the default in systems that for some
reason require an older kernel, but for newer kernels "cfq" has become
the default, and the older kernels support it as well, albeit that you
may have to tell the kernel to use it, either via a manual kernel
commandline parameter at boot, or via your bootloader configuration.

In LILO, add "elevator=cfq" to the "append" line for your kernel's
stanza. In GRUB, just add it to the kernel's boot options on
the "kernel" line. Note: All recent kernels use "cfq" by default, but
if you're running one of the pre-2.6.26 kernels - I'm not sure on the
exact kernel version in which "cfq" became the default, but it must
have been around that release - then you may need to tell the kernel to
use "cfq" instead of "anticipatory". In order to find out what your
kernel is using, run...

dmesg | grep -i scheduler

On my machine here, this provides me with the following output...:

[12:52:54][localhost:/home/aragorn]
[aragorn] $> sudo dmesg | grep -i scheduler
Password:
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)

(Note: Whether you need superuser privileges or not for the use
of "dmesg" depends on your distribution.)

And just to be in the clear on what kernel I'm running...:

[12:53:47][localhost:/home/aragorn]
[aragorn] $> uname -a
Linux localhost 2.6.26.8.tex3 #1 SMP Mon Jan 12 04:33:38 CST 2009 i686
AMD Athlon(TM) XP 2800+ GNU/Linux

> while using different partitions or hard disks of different speeds for
> swap, you can assign priorities to each partition so that the kernel
> will use higher priority hard disk first.

True.

> in addition , the kernel will distribute visit counts in a round robin
> fashion across all devices with equal priorities
>
> Ex:-
> /dev/sda6 /swap swap pri=4 0 0
> /dev/sda8 /swap swap pri=4 0 0
> /dev/sdb3 /swap swap pri=1 0 0

True. But, again, if you need to gain performance by distributing the
swap across multiple partitions on multiple hard disks, then you've got
a problem, i.e. your system has too little RAM. These days, RAM is
relatively cheap, and the performance gain from putting a sufficiently
high enough amount of RAM in your machine to avoid swapping is a lot
better than the performance gain you get from distributing your swap
space across multiple disks. ;-)

--
*Aragorn*
(registered GNU/Linux user #223157)

From: The Natural Philosopher on 20 Jul 2010 08:12

annalissa wrote:
> Hi all,
>
> The following is what i have read in a magazine named "linux for
> you" , To what extent this is true ?
>
>
>
> ideally dedicate a set of high performance disks spread across two or
> more controllers .
>
> if swap space resides on a busy disk, then to reduce latency, it
> should be located as close to a busy partition
> like (/var) as possible to reduce seek time for the drive head
>
>
> while using different partitions or hard disks of different speeds for
> swap, you can assign priorities to each partition so that the kernel
> will use higher priority hard disk first.
>
> in addition , the kernel will distribute visit counts in a round robin
> fashion across all devices with equal priorities
>
> Ex:-
> /dev/sda6 /swap swap pri=4 0 0
> /dev/sda8 /swap swap pri=4 0 0
> /dev/sdb3 /swap swap pri=1 0 0
>
>
>
>
>
>
>
>
>
If you need to tune swap, you are already in such bad trouble, that its
time to fit more RAM.

My machine has now got to te stage that if I run two graphic intensive
apps, it has to start swapping. Performance is down by about 10,000

No amount of swap tuning is going to compensate for the fact its
basically too small a machine for the jobs I am now asking it to do.

So I don't run two apps loaded with bitmaps,. together. End of story.

From: Grant on 20 Jul 2010 09:45

On Tue, 20 Jul 2010 13:12:45 +0100, The Natural Philosopher <tnp(a)invalid.invalid> wrote:

>annalissa wrote:
>> Hi all,
>>
>> The following is what i have read in a magazine named "linux for
>> you" , To what extent this is true ?
>>
>>
>>
>> ideally dedicate a set of high performance disks spread across two or
>> more controllers .
>>
>> if swap space resides on a busy disk, then to reduce latency, it
>> should be located as close to a busy partition
>> like (/var) as possible to reduce seek time for the drive head
>>
>>
>> while using different partitions or hard disks of different speeds for
>> swap, you can assign priorities to each partition so that the kernel
>> will use higher priority hard disk first.
>>
>> in addition , the kernel will distribute visit counts in a round robin
>> fashion across all devices with equal priorities
>>
>> Ex:-
>> /dev/sda6 /swap swap pri=4 0 0
>> /dev/sda8 /swap swap pri=4 0 0
>> /dev/sdb3 /swap swap pri=1 0 0
>>
>>
>>
>>
>>
>>
>>
>>
>>
>If you need to tune swap, you are already in such bad trouble, that its
>time to fit more RAM.
>
>
>My machine has now got to te stage that if I run two graphic intensive
>apps, it has to start swapping. Performance is down by about 10,000
>
>No amount of swap tuning is going to compensate for the fact its
>basically too small a machine for the jobs I am now asking it to do.
>
>
>So I don't run two apps loaded with bitmaps,. together. End of story.

Not the end of story for some who run memory intensive computations
that rely on swap. Not everyone is interested in a nicely tuned
interactive desktop ;)

The kernel will treat swaps on different like RAID0 if they're set to
same priority.

I usually put swap in at partition five, first in the logicals, on each
drive than run them at same priority. Large swap rarely comes in handy,
but is good for the occasional large or silly task. Better than have
the kernel start killing off processes in response to out-of-memory.

Yes, if the memory usage goes into swap, things really slow down, but
that's what computing used to be like all the time some years ago (hmm,
maybe last century?).

Way I setup disk is that OS lives on primaries at fast end of disk, and
the slower archival stuff at the slow end of disk.

For example on a two disk machine I may put / on sda2 and /usr on sdb2,
I allow for at least two OS installs so I can update OS but keep the
old one about until new one is bedded down. Share the swaps, of course.

If the box has windoze, I'll put the paging file on other drive to where
OS is installed.

If, for some reason I need lots extra swap, it's easy to add some
swapfiles somewhere convenient -- that happens rarely, but I have
done it while processing quite large database tables a while back.

Grant.

From: Doug Freyburger on 20 Jul 2010 11:00

The Natural Philosopher wrote:
>
> If you need to tune swap, you are already in such bad trouble, that its
> time to fit more RAM.

Incidentally this includes memory leaks. Since memory leaks are slow
there is no motivation to tune swap space to optimize it. The solution
is to patch the executible that's leaking memory not to optimize swap.

> My machine has now got to te stage that if I run two graphic intensive
> apps, it has to start swapping. Performance is down by about 10,000

Cache inside the CPU chip may be one order of magnitude faster than main
memory. Virtual memory on disk may be four orders of magnitude slower
than main memory.

> No amount of swap tuning is going to compensate for the fact its
> basically too small a machine for the jobs I am now asking it to do.

If possible keep enough RAM to not swap. Buy more machines to run the
extra applications on. Hosts are cheap to add to the data center.

It's not always possible. A use can always be found that blows any RAM.
The question to ask yourself is - Should what I am doing blow the
maximum RAM I can put in this host? If it shouldn't but it does then
it's time to trim down the application. If it should that's when it is
worth tuning swap. Uses that will automatically blow any installed RAM
are rare. If you can't easily explain why your use will blow any
installed RAM then your use shouldn't.

Early in my career I did VLSI CAD development. That's the sort of use
that will blow any conceivable installed RAM. As with any other tuning
operation we got 1000 times the speed improvement from carefully tuning
the internal loops to reduce paging that we got from any effort at
tuning where the paging took place.

If you need more swap for memory leaks, backing store, reserved
images pages (an HPUX and AIX issue that does not seem to occur on
Linux) and occasional spikes of usage, there will be no benefit to
tuning swap. If you know exactly why your application should blow any
conceivable installed RAM then it's worth some effort.

At one point I supported engineers doing a very big mechanical
engineering CADAM project. The RAM available was big enough for the
designers of the individual parts. It took some initial benchmarking to
ensure that. The machines that swapping were the NASTRAN simulators and
the ones used by the assembly testers.

For the NASTRAN simulators we got better performance increase by
rebuilding the hosts with a single larger swap partition than by adding
extra swap partitions. That result surprised me because I expected that
a swap partition on each of the 4 internal drives would do better. It
turned out the reinstall reorganized the files and slightly decreased
the amount of data read into memory in the first place. Even a small
decrease in the number of page faults to disk overwhelmed the benefit
from using multiple spindles. Most likely what happened is I learned
what I wanted to do with those hosts and so when I rebuilt them the
second time I rebuilt them based on the previous experience so the build
was better in general for the exact useage.

So know your applications and consider how to rebuild your host
specifically for that use. Making it a specialist box will work better
than adding a random application to a generalist box and then worrying
about performance.

| Next | Last
Pages: 1 2 3
Prev: always create the locations that need frequent I/O /home, swap on the outer tracks ?
Next: Clarifications needed to configure NFS client