From: Walter Banks on 15 Jan 2010 07:04 Hans-Bernhard Br�ker wrote: > David Brown wrote: > > > Nothing stops the compiler from doing this sort of thing in /theory/. > > But /practice/ is a different matter. > > The same thing applies to the original question. If a compiler's > full-blown optimizer, given practically infinite time to ponder the > problem, can't get that analysis job done, then no other tool can, and > certainly not while just looking over the programmers' shoulders as they > type. The compilers optimizer primarily goal is to map an application on a target processor. Most good optimizers have a lot of information on the application code and the resources required to implement a specific instance. A possible implementation might involve trying alternative approaches and using compiler metrics This could be automated. Regards, -- Walter Banks Byte Craft Limited http://www.bytecraft.com
From: Jon Kirwan on 15 Jan 2010 17:18 On Fri, 15 Jan 2010 12:19:11 -0800 (PST), -jg wrote: >On Jan 16, 8:28�am, Jon Kirwan <j...(a)infinitefactors.org> wrote: >> >> Walter, they don't even do _that_ task well enough. >> >> Since this topic appears to suggest the idea of having a >> compiler do something I consider almost crazy-minded at this >> point in time..... > > Yes, the best reply to the original overall question, is a Pencil !!. Hehe. I can conceive (and accept) the idea that once good _human_ imagination has been applied, that these concepts can be more rigorously explored (along the lines found in Knuth's early 1970's 3-volume set) about their boundary conditions and application space and placed into a "dictionary of moves" (as in some chess programs) that a compiler can use, together with heuristics to adduce which of the many options to choose as a "fit." Yeah, I can imagine the idea. And perhaps some of the low hanging fruit there can be picked, too. Don't know. But cripes! Some time ago, I engaged in a very interesting process exploring various compilers' results upon compiling a very simple (we are talking maybe 10-12 lines of code here?) and well-known algorithm for calculating the GCD of two integers. The creative juices of several were applied, with the result of more than one excellent topological reformation that, had a compiler observed it, would have resulted in the automatic generation of object code very close to the best a human can do on their own. (Once the slight reorganization was found, merely using traditional optimizations that have been implemented, not just known about, for two decades and more.) I have no problem discussing viable ideas about moving well forward. But a journey of a thousand miles starts with a single step. I'd just like to see some serious discussion about some of these earlier steps and some actual implementation work exhibited. I'm not speaking from afar, from some distanced or uninvolved position. I'm willing to put in my own time here, to put money where my mouth is at. I'm NOT a practiced compiler writer, only having a few experiences of my own (a BASIC interpreter, with a library, and a toy c compiler without any of the library system.) But I'd be willing to, under the guidance of those more recently and broadly experienced, cooperate and supply time working on some ideas if it would help here. I'm seriously interested in meaningful compiler advances. There is so much that is yet to do that is right in front of us, though. I'd like to see success there before wasting so much breath on what appears right now to be pretty far out and even then relatively ad hoc besides. > However, there are some details, aside from the >'find a new algorithm'(?!) request that tools could be >expected to offer. > > Taking this: >[" It will be even more helpful if that tool >also provides the cycle counts, cache >usage, cache misses and lines of code >also. "] To add something else I'd like to have in a language above the level of assembly, since we are chatting, is the ability to specify that two different code edges on either side of a branch must execute in the exact same cycle count. In other words, that regardless of the condition, the entire block of code including the condition test must have one and only one execution time. I have application for this. But we may be simply discussing all this by starting from a bottom up approach, listing various "leaf ideas" we'd like to see -- hoping perhaps that, by induction, we may find an enclosing principle from that. I'd like to see a more theoretical approach taken, where specific approaches are _deduced_ readily by applying circumstances to theory. So here and regarding your suggestion about cache usage and cache misses, I'd refer to this: http://mitpress.mit.edu/catalog/item/default.asp?tid=5098&ttype=2 A 1985 dissertation. Although it was more focused on DRAM bank organization, among other things, I suspect that the theoretical ideas it presents could also be used to deduce some thoughts addressing a variety of cache optimizations, taking those specifics into account. > Lines of code should already be there, in a listing output, and also >in a debugger. Yeah. It's been done. It just isn't done enough for embedded work, I suppose. > Likewise, on a chip/ICE with HW breakpoints and sys timers, you could >expect a good system to be able to give time-of-flight on running >code, if you really need that. > The OP sees to expect all this in an EDITOR, which is miss-directed, >but it is possible in a full tool chain. I suppose so. I was addressing myself to the idea of swapping in a quick sort for a bubble sort and similar thrusts, though. On that point, there are some earlier steps to take first. Jon
From: David Brown on 16 Jan 2010 11:11 Andy wrote: > On Jan 14, 9:57 am, David Brown <da...(a)westcontrol.removethisbit.com> > wrote: >> I don't agree here (perhaps as a compiler writer you are thinking of >> "implementation" in terms of generated target code - then I'd agree). >> Kids use Logo to learn about programming concepts, and how to get the >> computer to do what you want it to do. They learn to write things like: >> >> TO SQUARE :size >> REPEAT 4 [ FD :size RT 90 ] >> END >> >> This is entirely about writing an imperative implementation of how you >> want the system to draw a square. >> >> Compare this to a sample program in a real functional programming >> language, Haskell: >> >> factorial 0 = 1 >> factorial n = n * factorial(n - 1) >> >> Here you tell the system what you want - you give a mathematical >> description of the results. You don't care how it gets them - maybe it >> uses recursion, maybe it uses iteration, maybe it builds up a cache of >> calculated values as it goes along. >> > > The LOGO interpreter/compiler is just as free to implement alternative > solutions to drawing a square as the Haskell compiler is of altering > the described recursive implementation of a factorial. Whether the > compiler is smart enough to do so has nothing to do with the language > being "procedural" or "functional". > It is certainly true that the compiler is free to implement alternative solutions in any language. However, the style of language has a great deal to say in how hard a task this is for the compiler writer. In particular, with a language that does not have states or allow side effects in functions, it is much easier for the compiler to have a complete view of what the function does - and thus it has a better chance of generating alternative implementations. You can think of compiling (or interpreting - it's the same thing in theory) as taking a high level source code down to a low level implementation in a series of steps. At each step down, the compiler can pick one of many implementations at the lower level. It can also re-arrange the code at the same level. Moving up a level from an implementation is a much harder task, and will rarely lead to a better result in the end - too much of the "overview" information is already lost, and there are too many implementation details that cannot be distinguished from requirements. Thus an ideal compiler has the best chance of generating the best code if the source is at a higher level, and if it is written in a language that gives a clearer and more mathematically precise specification. Of course, practical reality does not match directly with the theory. In particular, vastly more man-hours have been spent to write better compilers for C than for Haskell - the result being that you can compile C code to tighter target code than you can with Haskell code.
From: Jon Kirwan on 16 Jan 2010 17:01 On Sat, 16 Jan 2010 13:01:24 -0800 (PST), -jg <jim.granville(a)gmail.com> wrote: >On Jan 16, 11:18�pm, Walter Banks <wal...(a)bytecraft.com> wrote: > >> To illustrate this point. Your exact timing example is not a >> C compiler limitation but a language limitation. How do >> you describe exact timing all paths in a function in an >> unambiguous way in C? Exact timing is an application >> objective. > > Something like exact timing would be via local directives, and is >going to be very locally dependant. Oh, yes. On some machines (on one extreme, the modern x86 with branch detection logic, read-around-write data flows, registration stations, all manner of out-of-order execution, and multi-level caching), it would be a nightmare to achieve. On others, quite achievable in practice. The "requirement" would instead be a "goal" for the compiler language. Something like "register" is for a c variable -- a suggestion -- but perhaps with a little higher priority meant than 'register' means today. #pragma might be appropriate to bracket the code where all edges should perform in similar times, though I'm sure there will be problems to work out there, too. > It almost comes into "better reporting" Not just better reporting for the case I'm talking about. If you can't provide something that actually impacts the code generation but have very good reporting on what you get out of it, you will find yourself fighting the darned thing at every turn of the code trying to work out equal times. It's far better to provide the clues and hints a compiler can then use in generating code. > I remember we added a simple code-size report, for each code block, >so pgmrs could quickly compare different choices. Trivial to do, and >useful. > > We did think about reporting time, but that quickly degenerates as >you do not know what path combinations will actually occur, and it can >vary widely across >same core:different vendor. > So our solution here was always to verify on a scope, >and patch as needed. That covers all complexity levels. > > > On this 'exact timing' topic, I note the new Cortex M0 cores (NXP, >Nuvoton?), claims a fixed-int-response time, so they remove the >'current opcode' jitter. This was also discussed some years ago as a >feature option for a faster 8051 that never got to silicon. > It's a good thing to see devices that are moving 'less deterministic' >offer a way to be 'more deterministic' :) >[- and it side-steps all that SW work ] The ADSP-21xx provides rock-solid timing. If the only interrupt source is the timer, since that is in synch with the clock and doesn't need to be asynchronously latched and tested, then even the interrupt latency is _exactly_ known. Very sweet. I used that to advantage. Jon > >-jg
From: Jon Kirwan on 16 Jan 2010 22:08
On Sat, 16 Jan 2010 14:43:11 -0800 (PST), -jg <jim.granville(a)gmail.com> wrote: >On Jan 17, 11:01�am, Jon Kirwan <j...(a)infinitefactors.org> wrote: >> On Sat, 16 Jan 2010 13:01:24 -0800 (PST), -jg >> >> <jim.granvi...(a)gmail.com> wrote: >> >> > It almost comes into "better reporting" >> >> Not just better reporting for the case I'm talking about. �If >> you can't provide something that actually impacts the code >> generation but have very good reporting on what you get out >> of it, you will find yourself fighting the darned thing at >> every turn of the code trying to work out equal times. �It's >> far better to provide the clues and hints a compiler can then >> use in generating code. > > In the simplest cases, the difference between manually adding NOPs >and checking, or trawling the manual, setting a directive, and then >also checking the tool really DID do what you hoped, is negligible. > > In the more complex cases, the tools are almost certain to drop the >ball, so you'll also need to manually check. > > So that's why I favour 'better reporting' over 'granular smarts' >anytime, as the reporting is ALWAYS useful, and the smarts are only >case-by-case useful and with risky failure modes. >[aka The illusion of intelligence is dangerous] > > The rare times I've needed to do this, it was no sweat to just use a >pencil, and then do an operational check. Well, what can I say? I'd doing pretty much exactly this in the case where I needed it -- wrote it in assembly and counted instructions. However, I acknowledge that a compiler would do a reliable job of counting cycles. This is one of those good examples where it's better to let the compiler worry about the bookkeeping issues. Jon |