From: neuneudr on 5 Dec 2009 12:19 Hi all, On Oreilly's website there's a column about implementing multi-threaded algoriths, with an example of quicksort in Java. I'm scratching my head on someting... May Column: Multi-threaded Algorithm Implementations http://broadcast.oreilly.com/2009/06/may-column-multithreaded-algor.html I read the column and thought "that ain't working", then downloaded the code. Here's the array to be sorted: /** Elements to be sorted. */ @SuppressWarnings("unchecked") final Comparable[] ar; So it's a regular array. I fail to see what guarantees are made in the sorting code that swaps made by spawned threads are visible to someone who would be observing the []. Say we have 8,7,6,5,4,3,2,1 and the code makes the swaps in the array and as now 4,3,2,1,8,7,6,5 and asks to two threads to take care respectively of the 4,3,2,1 part and the 8,7,6,5 part. I agree that the spawned threads are going to see correctly what they should. But what guarantee are there that an observer will indeed see the array sorted at the end? To me to work the code needs to be changed, the following doesn't seem correct: final Comparable[] ar; I'd replace it with: final AtomicReferenceArray<Comparable> ar; To me you can't simply spawn thread and pass them a reference to an [] and indexes and tell it "go ahead, swap and re-arrange the [] as you like" and hope that changes are going to be visible. Java makes no such guarantees right!? While a AtomiceReferenceArray it's guaranteed to work, but it incurs a performance hit compare to the [] version. But then I guess I don't understand anything about the Java memory model nor about the code posted (there a tiny code.zip file, once you keep only the .java files related to quicksort it's tiny). Thoughts?
From: neuneudr on 5 Dec 2009 12:28 re all, adding to my own question... At the end of each spawned thread, we can see this: new Thread () { public void run () { // invoke single-thread qsort qsortN (left, pivotIndex - 1); synchronized(helpRequestedMutex) { helpersWorking--; } } }.start(); Is the synchronization on the mutex at the end of each sort enough to guarantee that changes made to the array are indeed visible? (and hence guaranteed to work seen that due to quicksort's nature we're sure each thread is working on a different part of the array)?
From: neuneudr on 5 Dec 2009 12:54 Here's the code... http://pastebin.com/m41b42a91 And even if synchronizing on 'helpRequestedMutex' would be enough (would it be?) I fail to see how the caller can synchronize on it too.
From: markspace on 5 Dec 2009 14:04 neuneudr(a)yahoo.fr wrote: > I'm scratching my head on someting... > > May Column: Multi-threaded Algorithm Implementations > > http://broadcast.oreilly.com/2009/06/may-column-multithreaded-algor.html I'm scratching my head also. Did you download the source code? I don't see at least one whole package of his. His package: algs.model.array just seems to be missing. It's not in the bin directory either. Broken code, not tested. Hmm, maybe his book isn't great too.
From: Peter Duniho on 5 Dec 2009 15:31 neuneudr(a)yahoo.fr wrote: > Here's the code... > > http://pastebin.com/m41b42a91 > > And even if synchronizing on 'helpRequestedMutex' would > be enough (would it be?) I fail to see how the caller > can synchronize on it too. The top-level waits on the volatile variable "helpersWorking", and since that variable is volatile and is modified only after the worker threads complete their effort and decrement the variable to 0. The use of "volatile" preserves ordering of the operations, ensuring the main thread does not return until all of the sorting has in fact been completed. Note that "volatile" affects not just the variable used, but relative ordering within the thread altogether. _All_ writes that occur before the write to the volatile variable (i.e. including those to the array being sorted) will be seen by any thread that sees the write to the volatile variable. (It also works the other way: if you read a volatile variable, you are assured that all reads that occur later in code order will actually happen after the read of the volatile variable). That said, the code isn't defect-free. There's a relatively benign race condition that hasn't been handled properly, which can result in the number of worker threads exceeding the desired maximum limit. Using an AtomicInteger instead of the clumsy "object as mutex" would allow that to be easily fixed. A more serious problem is that his top-level thread spins waiting for the "helpersWorking" variable to reach zero. So one thread is completely consumed just _monitoring_ the progress. This is a terrible use of CPU time, just when one is trying to take full advantage of a multi-core system. The fewer worker threads there are, the more noticeable this problem will be (to the point of possibly completely occupying all of the CPU time for a whole CPU core). Ironically, on that last point, because of another defect in the code -- specifically, ideally the number of active working threads would be the same as the number of CPU cores, to minimize context switching overhead, but his code sets the number of threads to some arbitrarily high number based on the size of the input and a user-selected ratio -- the spin-wait of the main thread winds up being less of a problem relative to the total throughput of _this_ code. That's because he's got potentially lots of threads all contending for the CPU, most of which do actually do some work when they eventually get it. So the one thread also contending for the CPU and not doing useful work winds up not mattering as much, if there's a large number of threads (which for his code, there usually is). He seems satisfied by what looks to me to be a relatively marginal performance improvement. That is, sure...the code runs faster, but not nearly as much faster as it probably could with a better implementation. A better approach to the termination condition would be to use a CountdownLatch, so that the main thread can simply block and not use CPU time until the worker threads are actually done. That change, in conjunction with limiting the number of threads to the number of CPU cores would probably produce a significant performance improvement over what his code already shows. Pete
|
Next
|
Last
Pages: 1 2 3 Prev: add authorization header (user/password) before sendRedirect Next: catch native crash |