From: Betov on 10 Mar 2006 16:35 o//annabee <fack(a)szmyggenpv.com> ?crivait news:op.s57rx1o1ce7g4q(a)bonus: > you cannot lie in asm :]]] This is surely the reason why this nerd wrote HLA. :]]] Betov. < http://rosam.org >
From: Phil Carmody on 10 Mar 2006 16:53 "ldb" <ldb_nospam(a)hotmail.com> writes: > However he also says they use "simd intrinsics". To me, using > intrinsics is essentially assembly programming. Dan Bernstein's written a C-like pseudo-assembly language that understands the kinds of operations in most processors' SIMD instruction sets, such that you can program in what appears to be a HLL, and yet where possible you get a 1-1 mapping of operations onto instructions. (He has a pretty nifty register allocator to this end. And a scheduler too.) If not, then it turns them into good old-fashined blocks of SISD. Unfortunately, he's not exactly released this assembler yet, but has shown a few sample outputs that demonstrate it does what it says on the tin. http://cr.yp.to/ Phil -- What is it: is man only a blunder of God, or God only a blunder of man? -- Friedrich Nietzsche (1844-1900), The Twilight of the Gods
From: Dragontamer on 10 Mar 2006 18:04 ldb wrote: > However he also says they use "simd intrinsics". To me, using > intrinsics is essentially assembly programming. Doing your for-loop in > C++, and having the meat being all SIMD instrinics seems like a logical > way to proceed. You aren't going to get much performance increase by > switching the entire loop into assembly. I agree, although there were a few libraries that basically had a "vector" class that would be like multiply(vector_1, vector_2) or something... I remember just the demo-code for this library and it's been a while. So some abstractions can take place, and keep everything as fast with a great improvement in readability and reuseability. Although if he means SIMD intrinsics as in messing with prefetch and stuff like that :) That is most definitly just assembly language. > I think his main point (from skimming the talk) talks about 80% of the > computation time being "embarassingly" parrallel and the need for a > language that can encapsulate that more simply. Really, it seems, we > are headed to a better GPU language that handles SIMD operations more > effeciently. I think that's the main point. I think parallel-based languages should come out either way, as dual-core CPUs become more popular, who knows how soon quad-core become popular, and then oct-core, and then 128-core :-p --Dragontamer
From: randyhyde@earthlink.net on 10 Mar 2006 18:44 Dragontamer wrote: > I think parallel-based languages should come out either way, as > dual-core CPUs > become more popular, who knows how soon quad-core become popular, and > then > oct-core, and then 128-core :-p SMPs tend to reach the point of dimishing returns around 16 processors. At that point, cache and bus coherency problems tend to cause too many delays. The few system that have a large number of processors tend to have *very* expensive busses and don't support the kind of fine-grained multiprocessing you get with SMPs (if you want to call that "fine-grained"). That's not to say that parallel languages won't be able to take advantage of such systems. Quite the contrary, they may make such systems more accessible. But I doubt you'll see typical PCs or workstations exceeding 16 processors anytime soon (or even anytime far away). At least, not in a configuration where a single application can easily migrate between the processors during execution. Cheers, Randy Hyde
From: randyhyde@earthlink.net on 10 Mar 2006 19:55
Phil Carmody wrote: > "ldb" <ldb_nospam(a)hotmail.com> writes: > > However he also says they use "simd intrinsics". To me, using > > intrinsics is essentially assembly programming. > > Dan Bernstein's written a C-like pseudo-assembly language that > understands the kinds of operations in most processors' SIMD > instruction sets, such that you can program in what appears to > be a HLL, and yet where possible you get a 1-1 mapping of > operations onto instructions. (He has a pretty nifty register > allocator to this end. And a scheduler too.) If not, then it > turns them into good old-fashined blocks of SISD. Unfortunately, > he's not exactly released this assembler yet, but has shown a > few sample outputs that demonstrate it does what it says on the > tin. > > http://cr.yp.to/ > Hi Phil, I couldn't find the reference. Which of the lins on this page contains the sample code? Cheers, Randy Hyde |