From: Ron Mayer on
Tom Lane wrote:
> Magnus Hagander <magnus(a)hagander.net> writes:
>> ...oom_adj...
>
> One interesting thing I read there is:
> Swapped out tasks are killed first. Half of each child's memory size is
> added to the parent's score if they do not share the same memory.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> This suggests that PG's shared memory ought not be counted in the
> postmaster's OOM score, which would mean that the problem shouldn't be
> quite as bad as we've believed. I wonder if that is a recent change?
> Or maybe it's supposed to be that way and is not implemented correctly?

The code for oom_kill.c looks fairly readable (link below [1]):

96 points = mm->total_vm;
.....
117 list_for_each_entry(child, &p->children, sibling) {
118 task_lock(child);
119 if (child->mm != mm && child->mm)
120 points += child->mm->total_vm/2 + 1;
121 task_unlock(child);
122 }

Which seems to add points for each child who doesn't share the
same mm structure as the parent. Which I think is a quite a bit
stricter interpretation of "if they do not share the same memory".



[1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=mm/oom_kill.c;h=f52481b1c1e5442c9a5b16b06b22221b75b9bb7c;hb=HEAD


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Alex Hunsaker on
On Mon, Jan 4, 2010 at 09:55, Alvaro Herrera <alvherre(a)commandprompt.com> wrote:
> Magnus Hagander wrote:
>
>> Right. Which is why I like the idea of disabling the OOM killer for
>> the *postmaster*, but not the regular backends. Gives it a chance to
>> recover. It's not nice, but it's better than nothing.
>
> It doesn't sound like the init script can reenable the killer for the
> child processes though.  So, if there's anything that the core code
> ought to do, is re-enable OOM-killer for postmaster children, after
> being disabled by the initscript.

Exactly.

FWIW here is the patch I run. Stupid as the patch may be, count it as
a +1 for people in the field doing this. Hence a reason to think
about doing something in core. maybe.

This has some oddities like it does not reset oom to 0 for the (wal)
writer process. Plus assuming you do oom, the stats collector has a
good chance of being hit. Although normal backends will probably have
a higher score.

[ oom_adj gets set to -17 in the startup script. I run this on top of
disabling overcommit, color me paranoid ]

*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***************
*** 362,367 **** StartAutoVacLauncher(void)
--- 362,370 ----
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+
+ oom_adjust();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);

*** a/src/backend/postmaster/fork_process.c
--- b/src/backend/postmaster/fork_process.c
***************
*** 65,68 **** fork_process(void)
--- 65,84 ----
return result;
}

+ void
+ oom_adjust(void)
+ {
+ /* adjust oom */
+ FILE *oom = fopen("/proc/self/oom_adj", "w");
+
+ /*
+ * ignore errors we dont really care
+ */
+ if (oom)
+ {
+ fprintf(oom, "0\n");
+ fclose(oom);
+ }
+ }
+
#endif /* ! WIN32 */
*** a/src/backend/postmaster/pgarch.c
--- b/src/backend/postmaster/pgarch.c
***************
*** 161,166 **** pgarch_start(void)
--- 161,169 ----
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+
+ oom_adjust();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);

*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***************
*** 622,627 **** pgstat_start(void)
--- 622,630 ----
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+
+ oom_adjust();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);

*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 3056,3061 **** BackendStartup(Port *port)
--- 3056,3063 ----
{
free(bn);

+ oom_adjust();
+
/*
* Let's clean up ourselves as the postmaster child, and close the
* postmaster's listen sockets. (In EXEC_BACKEND case this is all
*** a/src/backend/postmaster/syslogger.c
--- b/src/backend/postmaster/syslogger.c
***************
*** 530,535 **** SysLogger_Start(void)
--- 530,538 ----
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+
+ oom_adjust();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(true);

*** a/src/include/postmaster/fork_process.h
--- b/src/include/postmaster/fork_process.h
***************
*** 13,17 ****
--- 13,18 ----
#define FORK_PROCESS_H

extern pid_t fork_process(void);
+ extern void oom_adjust(void);

#endif /* FORK_PROCESS_H */
From: Tom Lane on
Alex Hunsaker <badalex(a)gmail.com> writes:
> FWIW here is the patch I run. Stupid as the patch may be, count it as
> a +1 for people in the field doing this. Hence a reason to think
> about doing something in core. maybe.

Thanks for the patch --- it's certainly a fine starting point.

We can either drop this in core (with a lot of #ifdef LINUX added)
or expect Linux packagers to carry it as a patch. Given that the
packagers would also have to modify their init scripts to go with,
the patch route is not unreasonable. Comments?

> This has some oddities like it does not reset oom to 0 for the (wal)
> writer process.

FWIW, I think that's probably a feature --- I'd vote for only resetting
in regular backends and possibly autovac workers.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Alex Hunsaker on
On Thu, Jan 7, 2010 at 20:26, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Alex Hunsaker <badalex(a)gmail.com> writes:

> We can either drop this in core (with a lot of #ifdef LINUX added)

Any thoughts on doing something like (in fork_process.c)

#ifdef LINUX
void oom_adjust()
{
....
}
#else
void oom_adjust() {}
#endif

So there is only one #ifdef? It still leaves the ugly calls to the function...

> or expect Linux packagers to carry it as a patch.  Given that the
> packagers would also have to modify their init scripts to go with,
> the patch route is not unreasonable.  Comments?

Id plus +1 for core. The problem certainly does not look to be going
away soon (if ever).

>> This has some oddities like it does not reset oom to 0 for the (wal)
>> writer process.
>
> FWIW, I think that's probably a feature --- I'd vote for only resetting
> in regular backends and possibly autovac workers.

I think that makes sense +1. In-fact thats why the patch has it as a
separate function instead of hacked into fork_process(). However its
mainly odd because IIRC I greped for all instances of fork_process()
and added the oom_adjusting to the callers. Given that it seems the
wall writer procs should also be set to 0. My guess is its a race
with my startup script launching postgres and then setting oom_adj. Or
maybe I missed a caller? Maybe they don't use fork_process()? Ill
check it out.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Magnus Hagander on
On Fri, Jan 8, 2010 at 04:46, Alex Hunsaker <badalex(a)gmail.com> wrote:
> On Thu, Jan 7, 2010 at 20:26, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> Alex Hunsaker <badalex(a)gmail.com> writes:
>
>> We can either drop this in core (with a lot of #ifdef LINUX added)
>
> Any thoughts on doing something like (in fork_process.c)
>
> #ifdef LINUX
> void oom_adjust()
> {
> ...
> }
> #else
> void oom_adjust() {}
> #endif
>
> So there is only one #ifdef?  It still leaves the ugly calls to the function...

Seems like a much better way, yes. Especially if we in the future want
to do this for more than one platform (if it becomes necessary).


>> or expect Linux packagers to carry it as a patch.  Given that the
>> packagers would also have to modify their init scripts to go with,
>> the patch route is not unreasonable.  Comments?
>
> Id plus +1 for core.  The problem certainly does not look to be going
> away soon (if ever).

Yeah, I think core is better. It's not like it's enough code to cause
a huge maintenance problem, I think.

Do we need to make the value configurable? I'd certainly find it
interesting to set backends to say 5 or something like that, that
makes them less likely to be killed than any old "oops opened too big
file in an editor"-process, but still possible to kill if the system
is *really* running out of memory.



--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers