Prev: VLIW pre-history
Next: what is CCIR-565
From: MitchAlsup on 17 Aug 2007 13:53 On Aug 15, 2:44 pm, timcaff...(a)aol.com (Tim McCaffrey) wrote: > I notice it breaks alot of the PPro rules as well: Uses a GPR, one XMM, > one (unrestricted alignment) memory access (or another XMM), and a GPR > and a Flags result all in one instruction. Must cause all kinds of > havoc syncronizing the execution pipes. Once the FCMP*I instructions went in (FP comparison, int flags result) the pipeline guys built a dedicated bus from the FPU to EFLAGS to make these fast. The FPUs do both 80-bit and SSE, so having a EFLAGS result is straightforward (in the sense of interlocks and pipelining). Mitch
From: Patrick de Zeester on 18 Aug 2007 15:10 Piotr Wyderski wrote: > John Mashey wrote: > >> BUT, the very FIRST thing to do is to profile a wide range of >> programs, see how much time is consumed by str* functions, and decide >> whether this is even worth arguing about [it might, or might not be; >> the last time I did this was along ago, but at the time, bcopy/memcpy >> was the only high-runner.] > > In my case SIMD does help a lot, therefore I don't care about > "a wide range of programs". ;-) My programs should provide > the highest performance available, the remaining ones > can be slow (most of cases) or even should be slow > (our competitors)... If performance is a concern why scan for the zero terminator to determine the length of a string? You could just stored the length with the string itself.
From: Piotr Wyderski on 20 Aug 2007 10:29
Patrick de Zeester wrote: > If performance is a concern why scan for the zero terminator to > determine the length of a string? You could just stored the length with > the string itself. strlen() is just a simple and clean example of what can be done the SIMD way, hence it is a toy function to play with. But the field of possible applications is much, much wider. Anyway, even in the case of memcmp()/strcmp() the cached length field doesn't help. Best regards Piotr Wyderski |