Prev: always create the locations that need frequent I/O /home, swap on the outer tracks ?
Next: Clarifications needed to configure NFS client
From: Keith Keller on 20 Jul 2010 17:17 On 2010-07-20, Grant <omg(a)grrr.id.au> wrote: > > I ignore runaways as a valid thing to plan for. After all, in the > last ten years I think only time I lost a Linux box is when I played > with a recursion bomb, out of curiosity. I've had two different users unintentionally start runaway processes on at least three different occasions in the past two years. Obviously there are ways to deal with this (ulimit resources for one) other than simply not utilizing swap, but there could be a legitimate reason someone needs 100GB of memory for one process. > And, in that circumstance, > a large swap area can give one time to take action before the machine > dies. A large swap area makes things worse, I've found. If you have a small swap space, the OOM killer will be able to kill off processes without allowing processes to spend a ton of time swapping out. If your swap is large, then the OOM killer doesn't kick in right away, processes are swapping like crazy, and even a task like getting a shell takes minutes. There is lots of argument about the OOM killer and overcommitting memory by the linux kernel. A Google search will turn up numerous links on the subject (about which I am decidedly not an expert). --keith -- kkeller-usenet(a)wombat.san-francisco.ca.us (try just my userid to email me) AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt see X- headers for PGP signature information
From: Grant on 20 Jul 2010 19:27 On Tue, 20 Jul 2010 21:11:58 GMT, unruh <unruh(a)wormhole.physics.ubc.ca> wrote: >On 2010-07-20, Grant <omg(a)grrr.id.au> wrote: >> On Tue, 20 Jul 2010 09:20:19 -0700, Keith Keller <kkeller-usenet(a)wombat.san-francisco.ca.us> wrote: >> >>>On 2010-07-20, Grant <omg(a)grrr.id.au> wrote: >>>> >>>> I usually put swap in at partition five, first in the logicals, on each >>>> drive than run them at same priority. Large swap rarely comes in handy, >>>> but is good for the occasional large or silly task. Better than have >>>> the kernel start killing off processes in response to out-of-memory. >>> >>>I don't think this is necessarily true. If your process is a runaway >>>task, it's much much better to have the kernel kill it off right away >>>than to let it fester in swap, dragging everything else down with it. >>>This is of course assuming that the runaway process in question is >>>using the most memory, which might not be the case if you have a big >>>RDBMS running, for example. The OOM killer can be customized in recent >>>kernels to help protect certain classes of processes. >> >> I ignore runaways as a valid thing to plan for. After all, in the >> last ten years I think only time I lost a Linux box is when I played >> with a recursion bomb, out of curiosity. And, in that circumstance, >> a large swap area can give one time to take action before the machine >> dies. Can be a fun race, particularly if one forgots the 'killall' >> command at the time. > >How in the world could you "lose" the box? Do you mean it crashed, or >that some irretrievable badness occured (CPU caught fire, hard disk was >erased, screen exploded in a shower of glass....) :-) Lost, as in no services, not available, gone, deceased, dead, crashed. Not a live box providing expected services, a navel gazer... > >> >> Much more likely to lose the box on power failure. Grant.
From: Grant on 20 Jul 2010 19:44 On Tue, 20 Jul 2010 14:17:40 -0700, Keith Keller <kkeller-usenet(a)wombat.san-francisco.ca.us> wrote: >On 2010-07-20, Grant <omg(a)grrr.id.au> wrote: >> >> I ignore runaways as a valid thing to plan for. After all, in the >> last ten years I think only time I lost a Linux box is when I played >> with a recursion bomb, out of curiosity. > >I've had two different users unintentionally start runaway processes >on at least three different occasions in the past two years. Obviously >there are ways to deal with this (ulimit resources for one) other than >simply not utilizing swap, but there could be a legitimate reason >someone needs 100GB of memory for one process. Yes, and that's the problem. You could set reasonable limits for users, allow the odd user more for a good reason? I suppose a reasonable limit is where there are few problems, and only a few requests for larger limits? And, my viewpoint is from there being one user, me ;) No idea what's best for a box serving many users. Back when I was at uni, anyone fork-bombing the system 'lost' their password until they reported to the sys-admin for a gentle chat ;) Usually runaway programs in the unix lab ate the local machine, not the shared filesystem or the server box (IRIX and Indy, O2 lab machines, dunno what the server was). > >> And, in that circumstance, >> a large swap area can give one time to take action before the machine >> dies. > >A large swap area makes things worse, I've found. If you have a small >swap space, the OOM killer will be able to kill off processes without >allowing processes to spend a ton of time swapping out. If your swap >is large, then the OOM killer doesn't kick in right away, processes are >swapping like crazy, and even a task like getting a shell takes minutes. Yes, here I make sure there's a root console open if I'm playing dangerous, or after reboot, when I discover I am in fact, playing dangerous. > >There is lots of argument about the OOM killer and overcommitting memory >by the linux kernel. A Google search will turn up numerous links on the >subject (about which I am decidedly not an expert). I don't like the OOM killer (and, by implication the over-commit that it has to cope with), but then, I've not triggered the thing in recent years. There are techniques or tuning to better control the thing, but I'm not in an environment that needs that tuning, not an area I've explored. Grant.
From: Keith Keller on 20 Jul 2010 22:27 On 2010-07-20, Grant <omg(a)grrr.id.au> wrote: > > Yes, and that's the problem. You could set reasonable limits for > users, allow the odd user more for a good reason? > > I suppose a reasonable limit is where there are few problems, and > only a few requests for larger limits? > > And, my viewpoint is from there being one user, me ;) No idea what's > best for a box serving many users. It depends a lot on the box and the users. In my environment, we have about a half-dozen regular users, plus another half-dozen occasional users, plus about a dozen seldom users. The regular users all work on the same project together, so if one writes a program that uses all available RAM (on our dev boxes, never on our public-facing boxes!), the worst that happens is a forced reset; next worst is OOM killer; next worst is I kill the process before it gets that bad. The odds of losing critical data (most of which is hosted on the fileserver, not on the dev boxes) is negligible, and the odds of losing more than a few hours' computation is also slim. So usually the aftermath is ridicule from the other developers, therefore I tend to err on the side of letting the developers work with no resource limits. On a server with more risk, you would want to be less lenient about resource limits. --keith -- kkeller-usenet(a)wombat.san-francisco.ca.us (try just my userid to email me) AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt see X- headers for PGP signature information
From: Doug Freyburger on 21 Jul 2010 10:31
Grant wrote: > >>There is lots of argument about the OOM killer and overcommitting memory >>by the linux kernel. A Google search will turn up numerous links on the >>subject (about which I am decidedly not an expert). > > I don't like the OOM killer (and, by implication the over-commit that it > has to cope with), but then, I've not triggered the thing in recent years. My latest run-in with the OOM killer is a system with Acronis doing backups across a CIFS/SMB mount, plus Oracle RMAN doing backups across a CIFS/SMB mount. Every so often I get a ticket that the paging rate has gone through the roof and by the time I can get connected the system does not respond to SSH. If I don't reset it it stays hung all night. In the syslog file are lots of OOM lines from just before it hung. I don't see why the client side of a CIFS/SMB mount would fill swap space and hang the system so I figure Acronis does that. A kernel invasive program to do snapshot OS backups? Such a program was specified by the client and it was not my choice! There is an endless stream of updates to Acronis - Clearly it is not ready for prime time. |