ipc semaphores: reduce ipc_lock contention in semtimedop [Kernel]

Prev: x86/mrst: add nop functions to x86_init mpparse functions
Next: x86/apbt: support more timer configurations on mrst

From: Chris Mason on 16 May 2010 18:50

On Sun, May 16, 2010 at 06:57:38PM +0200, Manfred Spraul wrote:
> On 04/13/2010 08:57 PM, Nick Piggin wrote:
> >On Tue, Apr 13, 2010 at 02:19:37PM -0400, Chris Mason wrote:
> >>I don't see anything in the docs about the FIFO order. I could add an
> >>extra sort on sequence number pretty easily, but is the starvation case
> >>really that bad?
> >Yes, because it's not just a theoretical livelock, it can be basically
> >a certainty, given the right pattern of semops.
> >
> >You could have two mostly-independent groups of processes, each taking
> >and releasing a different sem, which are always contended (eg. if it is
> >being used for a producer-consumer type situation, or even just mutual
> >exclusion with high contention).
> >
> >Then you could have some overall management process for example which
> >tries to take both sems. It will never get it.
> >
> The management process won't get the sem on Linux either:
> Linux implements FIFO, but there is no protection at all against starvation.
>
> If I understand the benchmark numbers correctly, a 4-core, 2 GHz
> Phenom is able to do ~ 2 million semaphore operations per second in
> one semaphore array.
> That's the limit - cache line trashing on the sma structure prevent
> higher numbers.
>
> For a NUMA system, the limit is probably lower.
>
> Chris:
> Do you have an estimate how many semop() your app will perform in one array?

There are two different workloads at play. The first is to just use
semaphores as a lock, which is a traditional mutex type operation (one
at a time). This isn't a problem with the current code, aside from lock
contention created by the second case.

The second case is batched wakeup. One process will wake hundreds or
more at once that are each waiting on their own semaphore.

>
> Perhaps we should really remove the per-array list,
> sma->sem_perm.lock and sma->sem_otime.

So far for our uses the per-array list was the biggest trouble. I tried
to benchmark your patches on Friday, but these are preproduction systems
and it appears to have woken up in a bad mood.

The hardware guys are giving it some TLC and I'll be able to run on
Monday.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nick Piggin on 18 May 2010 02:30

On Sun, May 16, 2010 at 06:57:38PM +0200, Manfred Spraul wrote:
> On 04/13/2010 08:57 PM, Nick Piggin wrote:
> >On Tue, Apr 13, 2010 at 02:19:37PM -0400, Chris Mason wrote:
> >>I don't see anything in the docs about the FIFO order. I could add an
> >>extra sort on sequence number pretty easily, but is the starvation case
> >>really that bad?
> >Yes, because it's not just a theoretical livelock, it can be basically
> >a certainty, given the right pattern of semops.
> >
> >You could have two mostly-independent groups of processes, each taking
> >and releasing a different sem, which are always contended (eg. if it is
> >being used for a producer-consumer type situation, or even just mutual
> >exclusion with high contention).
> >
> >Then you could have some overall management process for example which
> >tries to take both sems. It will never get it.
> >
> The management process won't get the sem on Linux either:
> Linux implements FIFO, but there is no protection at all against starvation.

Yeah I did realise this after I posted. But anyway I think FIFO
is reasonable to have, although you *may* be able to justify
removing it after your research of other UNIXes, if there are
sufficient gains.

>
> If I understand the benchmark numbers correctly, a 4-core, 2 GHz
> Phenom is able to do ~ 2 million semaphore operations per second in
> one semaphore array.
> That's the limit - cache line trashing on the sma structure prevent
> higher numbers.
>
> For a NUMA system, the limit is probably lower.
>
> Chris:
> Do you have an estimate how many semop() your app will perform in one array?
>
> Perhaps we should really remove the per-array list,
> sma->sem_perm.lock and sma->sem_otime.
>
> --
> Manfred
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: x86/mrst: add nop functions to x86_init mpparse functions
Next: x86/apbt: support more timer configurations on mrst