Prev: [HACKERS] Setting oom_adj on linux?
Next: [HACKERS] What's the current theory about derived files in VPATH builds?
From: Ron Mayer on 4 Jan 2010 16:34 Tom Lane wrote: > Magnus Hagander <magnus(a)hagander.net> writes: >> ...oom_adj... > > One interesting thing I read there is: > Swapped out tasks are killed first. Half of each child's memory size is > added to the parent's score if they do not share the same memory. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > This suggests that PG's shared memory ought not be counted in the > postmaster's OOM score, which would mean that the problem shouldn't be > quite as bad as we've believed. I wonder if that is a recent change? > Or maybe it's supposed to be that way and is not implemented correctly? The code for oom_kill.c looks fairly readable (link below [1]): 96 points = mm->total_vm; ..... 117 list_for_each_entry(child, &p->children, sibling) { 118 task_lock(child); 119 if (child->mm != mm && child->mm) 120 points += child->mm->total_vm/2 + 1; 121 task_unlock(child); 122 } Which seems to add points for each child who doesn't share the same mm structure as the parent. Which I think is a quite a bit stricter interpretation of "if they do not share the same memory". [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=mm/oom_kill.c;h=f52481b1c1e5442c9a5b16b06b22221b75b9bb7c;hb=HEAD -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Alex Hunsaker on 7 Jan 2010 18:58 On Mon, Jan 4, 2010 at 09:55, Alvaro Herrera <alvherre(a)commandprompt.com> wrote: > Magnus Hagander wrote: > >> Right. Which is why I like the idea of disabling the OOM killer for >> the *postmaster*, but not the regular backends. Gives it a chance to >> recover. It's not nice, but it's better than nothing. > > It doesn't sound like the init script can reenable the killer for the > child processes though. Â So, if there's anything that the core code > ought to do, is re-enable OOM-killer for postmaster children, after > being disabled by the initscript. Exactly. FWIW here is the patch I run. Stupid as the patch may be, count it as a +1 for people in the field doing this. Hence a reason to think about doing something in core. maybe. This has some oddities like it does not reset oom to 0 for the (wal) writer process. Plus assuming you do oom, the stats collector has a good chance of being hit. Although normal backends will probably have a higher score. [ oom_adj gets set to -17 in the startup script. I run this on top of disabling overcommit, color me paranoid ] *** a/src/backend/postmaster/autovacuum.c --- b/src/backend/postmaster/autovacuum.c *************** *** 362,367 **** StartAutoVacLauncher(void) --- 362,370 ---- #ifndef EXEC_BACKEND case 0: /* in postmaster child ... */ + + oom_adjust(); + /* Close the postmaster's sockets */ ClosePostmasterPorts(false); *** a/src/backend/postmaster/fork_process.c --- b/src/backend/postmaster/fork_process.c *************** *** 65,68 **** fork_process(void) --- 65,84 ---- return result; } + void + oom_adjust(void) + { + /* adjust oom */ + FILE *oom = fopen("/proc/self/oom_adj", "w"); + + /* + * ignore errors we dont really care + */ + if (oom) + { + fprintf(oom, "0\n"); + fclose(oom); + } + } + #endif /* ! WIN32 */ *** a/src/backend/postmaster/pgarch.c --- b/src/backend/postmaster/pgarch.c *************** *** 161,166 **** pgarch_start(void) --- 161,169 ---- #ifndef EXEC_BACKEND case 0: /* in postmaster child ... */ + + oom_adjust(); + /* Close the postmaster's sockets */ ClosePostmasterPorts(false); *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *************** *** 622,627 **** pgstat_start(void) --- 622,630 ---- #ifndef EXEC_BACKEND case 0: /* in postmaster child ... */ + + oom_adjust(); + /* Close the postmaster's sockets */ ClosePostmasterPorts(false); *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *************** *** 3056,3061 **** BackendStartup(Port *port) --- 3056,3063 ---- { free(bn); + oom_adjust(); + /* * Let's clean up ourselves as the postmaster child, and close the * postmaster's listen sockets. (In EXEC_BACKEND case this is all *** a/src/backend/postmaster/syslogger.c --- b/src/backend/postmaster/syslogger.c *************** *** 530,535 **** SysLogger_Start(void) --- 530,538 ---- #ifndef EXEC_BACKEND case 0: /* in postmaster child ... */ + + oom_adjust(); + /* Close the postmaster's sockets */ ClosePostmasterPorts(true); *** a/src/include/postmaster/fork_process.h --- b/src/include/postmaster/fork_process.h *************** *** 13,17 **** --- 13,18 ---- #define FORK_PROCESS_H extern pid_t fork_process(void); + extern void oom_adjust(void); #endif /* FORK_PROCESS_H */
From: Tom Lane on 7 Jan 2010 22:26 Alex Hunsaker <badalex(a)gmail.com> writes: > FWIW here is the patch I run. Stupid as the patch may be, count it as > a +1 for people in the field doing this. Hence a reason to think > about doing something in core. maybe. Thanks for the patch --- it's certainly a fine starting point. We can either drop this in core (with a lot of #ifdef LINUX added) or expect Linux packagers to carry it as a patch. Given that the packagers would also have to modify their init scripts to go with, the patch route is not unreasonable. Comments? > This has some oddities like it does not reset oom to 0 for the (wal) > writer process. FWIW, I think that's probably a feature --- I'd vote for only resetting in regular backends and possibly autovac workers. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Alex Hunsaker on 7 Jan 2010 22:46 On Thu, Jan 7, 2010 at 20:26, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Alex Hunsaker <badalex(a)gmail.com> writes: > We can either drop this in core (with a lot of #ifdef LINUX added) Any thoughts on doing something like (in fork_process.c) #ifdef LINUX void oom_adjust() { .... } #else void oom_adjust() {} #endif So there is only one #ifdef? It still leaves the ugly calls to the function... > or expect Linux packagers to carry it as a patch. Â Given that the > packagers would also have to modify their init scripts to go with, > the patch route is not unreasonable. Â Comments? Id plus +1 for core. The problem certainly does not look to be going away soon (if ever). >> This has some oddities like it does not reset oom to 0 for the (wal) >> writer process. > > FWIW, I think that's probably a feature --- I'd vote for only resetting > in regular backends and possibly autovac workers. I think that makes sense +1. In-fact thats why the patch has it as a separate function instead of hacked into fork_process(). However its mainly odd because IIRC I greped for all instances of fork_process() and added the oom_adjusting to the callers. Given that it seems the wall writer procs should also be set to 0. My guess is its a race with my startup script launching postgres and then setting oom_adj. Or maybe I missed a caller? Maybe they don't use fork_process()? Ill check it out. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Magnus Hagander on 8 Jan 2010 07:07
On Fri, Jan 8, 2010 at 04:46, Alex Hunsaker <badalex(a)gmail.com> wrote: > On Thu, Jan 7, 2010 at 20:26, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: >> Alex Hunsaker <badalex(a)gmail.com> writes: > >> We can either drop this in core (with a lot of #ifdef LINUX added) > > Any thoughts on doing something like (in fork_process.c) > > #ifdef LINUX > void oom_adjust() > { > ... > } > #else > void oom_adjust() {} > #endif > > So there is only one #ifdef? It still leaves the ugly calls to the function... Seems like a much better way, yes. Especially if we in the future want to do this for more than one platform (if it becomes necessary). >> or expect Linux packagers to carry it as a patch. Given that the >> packagers would also have to modify their init scripts to go with, >> the patch route is not unreasonable. Comments? > > Id plus +1 for core. The problem certainly does not look to be going > away soon (if ever). Yeah, I think core is better. It's not like it's enough code to cause a huge maintenance problem, I think. Do we need to make the value configurable? I'd certainly find it interesting to set backends to say 5 or something like that, that makes them less likely to be killed than any old "oops opened too big file in an editor"-process, but still possible to kill if the system is *really* running out of memory. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |