From: Scott A Crosby on 22 Nov 2006 17:44 On 21 Nov 2006 09:26:10 GMT, nmm1(a)cus.cam.ac.uk (Nick Maclaren) writes: > While using functional code to express functional concepts is good for > many reasons, contorting stateful designs into a functional style makes > them much HARDER to parallelise, by obscuring their design. Too true, but mostly stateful functional code isn't per-se a problem for parallelism, as long as the statefulness is constrained to a single thread. So a mostly functional program could still have all of the positive parallelization properties of a purely functional program. In fact, I believe in Standard-ML's CML it is *undefined* for more than one thread to access a mutable state. Allowing mutable state to be shared --- weakening this assumption --- introduces other complexities because the compiler now has to insure that mutations to the mutable state get propagated to all other threads which may have cached the old value in registers, the stack, etc. Even with locking, the compiler has to make conservative assumptions about what shared mutable state is protected by what lock. Now, assume every piece of shared mutable state was paired with a lock, where read/write of the state required grabbing the lock. Also assume that the runtime system forbids all cross-thread access to non-shared mutable state. (Depending on the type system, this might be statically verifiable with whole program analysis.) In this idea, all concurrent access to shared state is explicit and must be explicitly thought about. And, maybe a compiler with good data-flow analysis could track which shared state is protected by which lock and not flush cached data and reread the shared state. If there was a language with access to shared state (and the lock required for that access) and communication explicit in the language, how much useful parallelism would be exposed and is it possible for a compiler to find it? Also, since all shared state mutation and communication are explicit, the system has the ability to eagerly paralellize and only commit state changes when it knows they're good. Ideas? Scott
From: Nick Maclaren on 23 Nov 2006 05:02 In article <oyd3b8b2j9x.fsf(a)cs.rice.edu>, Scott A Crosby <scrosby(a)cs.rice.edu> writes: |> |> > While using functional code to express functional concepts is good for |> > many reasons, contorting stateful designs into a functional style makes |> > them much HARDER to parallelise, by obscuring their design. |> |> Too true, but mostly stateful functional code isn't per-se a problem |> for parallelism, as long as the statefulness is constrained to a |> single thread. So a mostly functional program could still have all of |> the positive parallelization properties of a purely functional |> program. In fact, I believe in Standard-ML's CML it is *undefined* for |> more than one thread to access a mutable state. You may be surprised, but Fortran was there first - by decades! That is PRECISELY the rule for accessing global objects and arguments in Fortran functions and has been since Fortran 66 (with it being recommended practice in Fortran II) .... The problem is contorting algorithms and designs to fit into those rules; the ones that fit naturally are no problem, and never have been, but not all are so well-suited. In my experience, explicit, shared global update (see below) is usually clearer and more reliable for such codes. |> Now, assume every piece of shared mutable state was paired with a |> lock, where read/write of the state required grabbing the lock. Also |> assume that the runtime system forbids all cross-thread access to |> non-shared mutable state. (Depending on the type system, this might be |> statically verifiable with whole program analysis.) In this idea, all |> concurrent access to shared state is explicit and must be explicitly |> thought about. And, maybe a compiler with good data-flow analysis |> could track which shared state is protected by which lock and not |> flush cached data and reread the shared state. That turns a problem at one level (undefined behaviour) into a problem at another (indeterminable whether deadlock or livelock occur). Yes, it helps by making the situation explicit, but that is the ONLY way in which it does. But I agree that is worthwhile, on the grounds that making problems explicit is almost always worthwhile. The claimed downside (that it forces the taking of a lock even when unnecessary) is answered by your data-flow analysis. If the compiler can't tell that the use is 'safe', then an unchecked access is a bug waiting to bite you; it it can, it can optimise it. It neither helps nor hinders that. Regards, Nick Maclaren.
From: jsavard on 23 Nov 2006 07:26 Terje Mathisen wrote: > ranjit_mathews(a)yahoo.com wrote: > > How does the Itanium compare in terms of SpecInts PER GHz? > > > Spec/GHz is very nearly totally meaningless. > > Why do you care? In *itself*, yes. But it lets one translate (somewhat meaningful) GHz into (much more meaningful) SpecInts. Also, it might be considered a figure of merit for architectural elegance or something. John Savard
From: Brian Hurt on 27 Nov 2006 21:05 Sorry for the delay in getting back to this. "Chris Thomasson" <cristom(a)comcast.net> writes: >> The biggest problem with parallelized code is the race condition- which >> arise from mutable data. Every peice of mutable data is a race condition >> waiting to happen. Mutable data needs to be kept to an absolute minimum, >> and then handled in such a way that it's correct in the presence of >> threads. >> >http://groups.google.com/group/comp.programming.threads/browse_frm/thread/b192c5ffe9b47926 >Your conclusion is wrong. No need to be afraid of mutable data... Just stick >it all in the writer side of a lock-free reader pattern. It's a question of correctness. Race conditions, deadlocks, livelocks, etc. are like wild C pointers- it only takes one of them to totally screw up a program. Can you write correct programs with C-style pointers? Yes. Demonstratably so. But it only takes one stupid programmer on the project to mess things up for the whole team- which is why pretty much the entire evolution of programming languages since C has been trying to get as far away from C pointers as possible. Just like any pointer is potientially a wild pointer, any mutable variable is a race condition waiting to happen. So the language needs to make sure that every mutable variable is not a race condition *somehow*- like Java makes sure every pointer dereference is safe. Note that every programming language has mutable data- even Haskell. The question isn't *if* mutable data, it's *how* mutable data- and as a subset of that, how little? >> There are two languages I know of in which it may >> be possible to write non-trivial parallel programs correctly and >> maintainably- concurrent haskell with STM and erlang- and both are, at >> their core, purely functional languages. >STM is a no go. Way too much overhead... Here are some of the reasons why: First of all, I'll take one published paper going "we actually implemented it and here are the benchmarks": http://citeseer.ist.psu.edu/shavit95software.html Second, I don't care how fast your program is if it keeps segfaulting- or deadlocking, or acting weird because of race conditions. And neither do most programmers, I comment- the definate trend of the last 25 or so years have been to languages that are lower performance but offer greater ease of development and correctness. C and Fortran were replaced by C++, which was slower (albeit not by much), which was replaced by Java and C#, which were slower than C++, and they in the process of being replaced by Python and Ruby, which are slower yet. It's not cpu cycles we're optimizing for, it's programmer cycles. The trick with parallel programming is mainly not in expressing the parallelism- well, there are some gains to be made there, but the real room for improvement is correctness. Anyone who has actually tracked down a race condition can tell you that they are the toughest category of bugs to track down- because they generally work. Unit tests are worthless. Brian
From: Brian Hurt on 27 Nov 2006 21:31
nmm1(a)cus.cam.ac.uk (Nick Maclaren) writes: >In article <e0t8h.7975$lq.2268(a)newsread1.mlpsca01.us.to.verio.net>, >Brian Hurt <bhurt(a)spnz.org> writes: >|> nmm1(a)cus.cam.ac.uk (Nick Maclaren) writes: >|> >|> >That experience debunked the claims of the >|> >functional programming brigade that such methodology gave automatic >|> >parallelisation. >|> >|> Automatic parallelization, no. You're looking for a silver bullet that >|> probably doesn't exist. On the other hand, functional programming makes >|> writting parallel code much easier to do. >Oh, no, I am not! I pointed out to those idiots at the time that silver >bullets didn't work, but there is no arguing with religious fanatics :-( OK. >While using functional code to express functional concepts is good for >many reasons, contorting stateful designs into a functional style makes >them much HARDER to parallelise, by obscuring their design. My response to that is to avoid statefull designs. At which point functional programming generally makes the design *less* obscure. To date, there are two categories of algorithms I can't do purely functionally: 1) Algorithms that rely on the O(1) nature of hash tables *and* where the number of elements being hashed is large enough that the constant factor disadvantage of hashing is overtaken by the O(log N) nature of a tree struture *and* the worst case O(N) performance of the hash table is acceptable. Remember that a string hash can easily be an order of magnitude more expensive than a string, meaning that for small structures, the cost of the single hash function is larger than that of the several string compares. 2) Graph algorithms, were I need to find the edges from a given node i fast. Arguably, this falls under 1. >Sorry, no. That may be the biggest LOW-LEVEL problem, but there are much >worse ones :-( There are four problems with multithreaded programming I've encountered: 1) race conditions 2) deadlocks 3) livelocks 4) priority inversions STM solves 2 and 4, with Haskell's type system ensuring that all mutable data is in STMs it also solves 1. I'm not sure it solves 3. livelocks, which is why I still wimble. >I have come to the conclusion that it isn't even desirable - but that >needs qualification. >I fully agree that functional programming should be the first port of >call. If your code can be expressed naturally in pure functional terms, >then that should be done - and NOT just for parallelism! We are in >absolute agreement there. >It's when it can't that the problem arises. Replacing iteration by >recursion with assumed tail removal is a neat idea that doesn't work; >it ends up being equivalent in all respects, problems and all. I'm not sure loops are the right place to be looking for parallelism. Outside of dense numeric work (the baliwick of Fortran), most loops simply don't loop that often- a few dozen times at most. For the foreseeable future, I don't see the compiler discovering signifigant amounts of parallelism. It's just too complicated. Functional programming might make it easier- but I don't think it'll make it easy *enough*. This means the parallelism that is there has to be expressed by the programmer explicitly. This means two things: 1) it has to be easy for the programmer to express the parallelism, to say that code A and code B may or shall run in parallel (the may case is the one I think is important- I'd love to see a STM Haskell with Cilk-style microthreads) 2) both code A and code B have to be able to work together in a multithreaded program. Preferably by default- running both A and B in parallel might not have been the plan when those peices of code were originally developed, Brian |