Prev: [PATCH -next] eeepc-wmi: depends on INPUT_SPARSEKMAP
Next: jump label v4 - x86: Introduce generic jump patching without stop_machine
From: Manfred Spraul on 13 Apr 2010 13:20 Hi Chris, On 04/12/2010 08:49 PM, Chris Mason wrote: > /* > + * when a semaphore is modified, we want to retry the series of operations > + * for anyone that was blocking on that semaphore. This breaks down into > + * a few different common operations: > + * > + * 1) One modification releases one or more waiters for zero. > + * 2) Many waiters are trying to get a single lock, only one will get it. > + * 3) Many modifications to the count will succeed. > + * > Have you thought about odd corner cases: Nick noticed the last time that it is possible to wait for arbitrary values: in one semop: - decrease semaphore 5 by 10 - wait until semaphore 5 is 0 - increase semaphore 5 by 10. > SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops, > unsigned, nsops, const struct timespec __user *, timeout) > { > @@ -1129,6 +1306,8 @@ SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops, > struct sem_queue queue; > unsigned long jiffies_left = 0; > struct ipc_namespace *ns; > + struct sem *blocker = NULL; > + LIST_HEAD(pending); > > ns = current->nsproxy->ipc_ns; > > @@ -1168,6 +1347,14 @@ SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops, > alter = 1; > } > > + /* > + * try_atomic_semop takes all the locks of all the semaphores in > + * the sops array. We have to make sure we don't deadlock if userland > + * happens to send them out of order, so we sort them by semnum. > + */ > + if (nsops> 1) > + sort(sops, nsops, sizeof(*sops), sembuf_compare, NULL); > + > Does sorting preserve the behavior? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Zach Brown on 13 Apr 2010 14:30 > What about Zach's simpler wakeup API? It's making slow progress in the background as a longer-term experiment. http://oss.oracle.com/~zab/wake-many/ That URL still has an API description, patches, and little test utilities for the simple first draft. - z -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Manfred Spraul on 14 Apr 2010 12:20 On 04/13/2010 08:19 PM, Chris Mason wrote: > On Wed, Apr 14, 2010 at 04:09:45AM +1000, Nick Piggin wrote: > >> On Tue, Apr 13, 2010 at 01:39:41PM -0400, Chris Mason wrote: >> >> The other thing I don't know if your patch gets right is requeueing on >> of the operations. When you requeue from one list to another, then you >> seem to lose ordering with other pending operations, so that would >> seem to break the API as well (can't remember if the API strictly >> mandates FIFO, but anyway it can open up starvation cases). >> > I don't see anything in the docs about the FIFO order. I could add an > extra sort on sequence number pretty easily, but is the starvation case > really that bad? > > How do you want to determine the sequence number? Is atomic_inc_return() on a per-semaphore array counter sufficiently fast? >> I was looking at doing a sequence number to be able to sort these, but >> it ended up getting over complex (and SAP was only using simple ops so >> it didn't seem to need much better). >> >> We want to be careful not to change semantics at all. And it gets >> tricky quickly :( What about Zach's simpler wakeup API? >> > Yeah, that's why my patches include code to handle userland sending > duplicate semids. Zach's simpler API is cooking too, but if I can get > this done without insane complexity it helps with more than just the > post/wait oracle workload. > > What is the oracle workload, which multi-sembuf operations does it use? How many semaphores are in one array? When the last optimizations were written, I've searched a bit: - postgres uses per-process semaphores, with small semaphore arrays. [process sleeps on it's own semaphore and is woken up by someone else when it can make progress] - with google, I couldn't find anything relevant that uses multi-sembuf semop() calls. And I agree with Nick: We should be careful about changing the API. -- Manfred -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Manfred Spraul on 14 Apr 2010 15:20 On 04/14/2010 07:33 PM, Chris Mason wrote: > On Wed, Apr 14, 2010 at 06:16:53PM +0200, Manfred Spraul wrote: > >> On 04/13/2010 08:19 PM, Chris Mason wrote: >> >>> On Wed, Apr 14, 2010 at 04:09:45AM +1000, Nick Piggin wrote: >>> >>>> On Tue, Apr 13, 2010 at 01:39:41PM -0400, Chris Mason wrote: >>>> The other thing I don't know if your patch gets right is requeueing on >>>> of the operations. When you requeue from one list to another, then you >>>> seem to lose ordering with other pending operations, so that would >>>> seem to break the API as well (can't remember if the API strictly >>>> mandates FIFO, but anyway it can open up starvation cases). >>>> >>> I don't see anything in the docs about the FIFO order. I could add an >>> extra sort on sequence number pretty easily, but is the starvation case >>> really that bad? >>> >>> >> How do you want to determine the sequence number? >> Is atomic_inc_return() on a per-semaphore array counter sufficiently fast? >> > I haven't tried yet, but hopefully it won't be a problem. A later patch > does atomics on the reference count and it doesn't show up in the > profiles. > > >> >>>> I was looking at doing a sequence number to be able to sort these, but >>>> it ended up getting over complex (and SAP was only using simple ops so >>>> it didn't seem to need much better). >>>> >>>> We want to be careful not to change semantics at all. And it gets >>>> tricky quickly :( What about Zach's simpler wakeup API? >>>> >>> Yeah, that's why my patches include code to handle userland sending >>> duplicate semids. Zach's simpler API is cooking too, but if I can get >>> this done without insane complexity it helps with more than just the >>> post/wait oracle workload. >>> >>> >> What is the oracle workload, which multi-sembuf operations does it use? >> How many semaphores are in one array? >> >> When the last optimizations were written, I've searched a bit: >> - postgres uses per-process semaphores, with small semaphore arrays. >> [process sleeps on it's own semaphore and is woken up by someone >> else when it can make progress] >> > This is similar to Oracle (and the sembench program). Each process has > a semaphore and when it is waiting for a commit it goes to sleep on it. > They are woken up in bulk with semtimedop calls from a single process. > > Hmm. Thus you have: - single sembuf decrease operations that are waiting frequently. - multi-sembuf increase operations. What about optimizing for that case? Increase operations succeed immediately. Thus complex_count is 0. If we have performed an update operation, then we can scan all simple_lists that have seen an increase instead of checking the global list - as long as there are no complex operations waiting. Right now, we give up if the update operation was a complex operation - but that does not matter. All that matters are the sleeping operations, not the operation that did the wakeup. I've attached an untested idea. > But oracle also uses semaphores for locking in a traditional sense. > > Putting the waiters into a per-semaphore list is really only part of the > speedup. The real boost comes from the patch to break up the locks into > a per semaphore lock. > > Ok. Then simple tricks won't help. How many semaphores are in one array? -- Manfred
From: Manfred Spraul on 15 Apr 2010 12:40
On 04/14/2010 09:50 PM, Chris Mason wrote: > On a big system I saw about 4000 semaphores total. The database will > just allocate as many as it can into a single array and keep creating > arrays until it has all it needs. > > What happens if SEMMSL is reduced (first entry in /proc/sys/kernel/sem)? -- Manfred -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |