Prev: Warnings about deprecated / insecure function usage
Next: App built by VS2008 causes "side-by-side configuration"-error in Vista
From: Kerem G�mr�kc� on 9 Mar 2008 14:46 Hi Joseph, >Inline assembly code? Scary... I wont say scarry,..i rather would say a "necessity" for some operations that must be performed as fast as possible. You as a professional and very experienced developer should know what i am talking about, especially in the case of mathematical calculations and algorithms, there is nothing that can beat fast direct cpu uperations with dedicated instructions and fast register access. You can strip down your code and compact it as much as possible and write your own prolog and epilog and create your own operative mechanism. But you are right many many developers still "underrate" the power of assembly in high level languages and most of them still think, that assembly is difficult and something like black magic. A friend of mine, a pro ..NET Developer always thinks that Assembly is dead. Thats "scary" to me. I also do .NET development, and very often, but there is nothing that can beat assembly and C/C++. Operating directly on hardware is the closest level to speed and fills the gap of the sometimes missed speed or missing features of runtime libraries. But on the other hand, if you are not used to assembly you can slow down someting that can be much faster with right high level code and good compiler optimization. This is the case if someone does not have a good knowledge of low level coding and has a lot of "intermediate" assembly instructions which cann e.g. solved with a special instruction in one row and cycle insted of throwing out a lot of moves, pushes, jumps and aritmethic operations. I always high recommend everybody to learn at least the basics of assembly language. Even if someone does not use asembly this makes you understand how function calls, memory operations and "natve debigging" works. I very good book on this was a book i read years ago from Joh Robbins, named "Debugging Windows". Highly recommended! I dont know whether this book is still up to date (new releases?) but even for today it is a very good book for learning how basic debugging works... Regards K. -- ----------------------- Beste Gr�sse / Best regards / Votre bien devoue Kerem G�mr�kc� Microsoft Live Space: http://kerem-g.spaces.live.com/ Latest Open-Source Projects: http://entwicklung.junetz.de ----------------------- "This reply is provided as is, without warranty express or implied."
From: Kerem G�mr�kc� on 9 Mar 2008 14:52 This seems to be the latest release, .NET debugging,...nah, ILDASM and go,... http://www.amazon.com/Debugging-Microsoft-NET-2-0-Applications/dp/0735622027/ref=pd_bbs_sr_1/105-9103892-2545257?ie=UTF8&s=books&qid=1205088615&sr=1-1 Regards K. -- ----------------------- Beste Gr�sse / Best regards / Votre bien devoue Kerem G�mr�kc� Microsoft Live Space: http://kerem-g.spaces.live.com/ Latest Open-Source Projects: http://entwicklung.junetz.de ----------------------- "This reply is provided as is, without warranty express or implied."
From: Joseph M. Newcomer on 9 Mar 2008 16:50 As a professional, one thing I know (and I spent years working on optimizing compilers, and worked for a company that produced them) is that nearly all the time the compiler can produce better code than a programmer writing assembly code. What is scary is that you are creating code which is expensive to write, expensive to debug, and expensive to maintain, without a quantitative justification for the performance improvement. I have a friend who didn't like the code he was getting, so he wrote a program that transformed his computation into a collection of grossly-ugly goto-style code. This code ran substantially faster, but it had the advantage that his algorithms were always written in C, then compiled into C that the (rather weak) optimizing compiler he had available to him would compile into more efficient code. Key here was that he never actually would write code as ugly as his tool produced, but it didn't matter. He didn't write that code. And his tool was generally useful for a variety of problems, not just one particular piece of code (all involved repetitive array computations). His machine had no cache, so he didn't worry about cache hits. He later extended it to do function inlining (his C compiler had no __inline directive) and actually did get an order of magnitude performance improvement. Note that the only way to measure code performance is in the release version; no debug code can ever be used as a benchmark or a criterion for determining computational cost. Use #pragma to turn on every possible optimization in the code (note that some optimizations are not "generally" safe, such as antialiasing, but in the context of, say, a mathematical computation on arrays will buy a lot). When possible, use inlines. Optimizations such as strength reduction, loop unrollilng, alpha motion, omega motion, and common subexpression elimination can often be done more efficiently by writing in C/C++. I have been using optimizing compilers since 1969, and the number of times I have found myself able to beat the compiler is vanishingly small. To me, assembly code is easy to read, but not cost-effective to write. Example: we had to do an FFT of integer data. I converted the data from integer to double in a copy of the array, passed it to the FFT subroutine, took the converted array, converted it back to integers for plotting, and we could not detect the impact of this on the overall performance; it appeared to operating in "real time". The major overheads of the operation were the new/delete of the double array and the new/delete of the int array, and in a real-time system they were still unnoticeable. The major computation was in the FFT algorithm, a proprietary algorithm developed in MATLAB by a numerical methods expert; MATLAB emitted C code to do the computaiton. We carefully examined the compiler-generated code in the FFT subroutine. Two assembly-code experts could not come up with anything significantly faster...we might have managed to get 3%-5% out of it at best, not worth the effort. Writing assembly code can, under extreme conditions, get you as much as a factor of 2 performance performance improvement. Changing your code to maximize L1 and L2 cache hits can buy you a factor of 10 to 20, while remaining in C/C++. If you really care about performance, data access organization is vastly more important than the cost of an instruction in an inner loop. So if you are concentrating on instructions, you are missing the high-payoff optimizations which are not code optimizations, but architecture optimizations (you change your algorithm). Note that if you are working on large data arrays, paging can become the dominant problem. A page fault costs you about six orders of magnitude performance. All the assembly code in the world will not "reimburse" you for a single page fault. Some years ago, I wrote the world's fastest storage allocator, and to do this I did NOT consider assembly code, but used a high-level language comparable to Ada. It had four levels of abstraction between the user and the actual memory blocks. A good optimizing compiler, with strong hints from __inline directives, can reduce three levels of abstraction to half an instruction. The equivalent of malloc, had we actually had an __inline capability (which was going to be in a future compiler release) would have been __inline PVOID allocate(int n) { if(n > limit) general_allocator(n); else { PVOID result = head[ (n + quantum-1)/quantum]; if(result == NULL) return general_allocator(n); else { head = head->next; return result; } } which in our compiler, had we done the inlining, would have generated the equivalent of mov eax, head[7] test eax jne $1 push 28 call general_allocator jmp $2 $1: mov ebx, DWORD PTR[eax] mov head[7], ebx $2: Note that limit was a compile-time constant and n almost always was a compile-time constant. It would have taken 5 instructions to allocate storage in most cases, which means it would take, in a modern machine, <10ns to do a storage allocation (single thread assumption here) [it took us 5us, because n that machine, it was one instruction/us, 2000-3000 times slower than a modern machine]. We didn't need to write it in assembler to get that performance. (As it turned out, because of parameter passing, it took us 4 extra instructions to call, and because at that point the value n was no longer a CTC, it took 3 extra instructions to implement the if-test, so inlining would have bought nearly a factor-of-2 performance increase in allocation with zero effort on our part. It was not a high priority because the allocator accounted for < 1% of the total execution time in an allocation-heavy application where we would allocate and free tends of thousands of objects). I've written hundreds of thousands of lines of assembly code in my career; possibly as many as half a million. For cost-effectiveness, nothing beats a good optimizing compiler. For performance, with very rare exceptions, nothing beats a good optimizing compiler. If my goal was to write the fastest possible inner loop for a mathematical computation, I would be spending my time worrying about cache hits first. Maximize cache hits. Hmmm. Now that I stop to think about it, my CPUID Explorer was done because I had to optimize some numeric code based on cache sizes, and needed to know the cache architecture...and yes, you can get an order of magnitude performance improvement. Without writing a single line of assembly code, I got better than a factor of 10 improvement. Assembly code is a last resort. The last three times I used it, I used it because I needed to execute very low-level code, such as CPUID and RDTSC, not supported in the C language. Think algorithms, not code. Think architecture, not instructions. joe On Sun, 9 Mar 2008 19:46:23 +0100, "Kerem G�mr�kc�" <kareem114(a)hotmail.com> wrote: >Hi Joseph, > >>Inline assembly code? Scary... > >I wont say scarry,..i rather would say a "necessity" for some >operations that must be performed as fast as possible. You as >a professional and very experienced developer should know >what i am talking about, especially in the case of mathematical >calculations and algorithms, there is nothing that can beat fast >direct cpu uperations with dedicated instructions and fast register >access. You can strip down your code and compact it as much >as possible and write your own prolog and epilog and create >your own operative mechanism. But you are right many many >developers still "underrate" the power of assembly in high level >languages and most of them still think, that assembly is difficult >and something like black magic. A friend of mine, a pro >.NET Developer always thinks that Assembly is dead. Thats >"scary" to me. I also do .NET development, and very often, >but there is nothing that can beat assembly and C/C++. Operating >directly on hardware is the closest level to speed and fills the >gap of the sometimes missed speed or missing features of >runtime libraries. But on the other hand, if you are not used >to assembly you can slow down someting that can be much >faster with right high level code and good compiler optimization. >This is the case if someone does not have a good knowledge of >low level coding and has a lot of "intermediate" assembly instructions >which cann e.g. solved with a special instruction in one row and >cycle insted of throwing out a lot of moves, pushes, jumps and >aritmethic operations. I always high recommend everybody to learn >at least the basics of assembly language. Even if someone does >not use asembly this makes you understand how function calls, >memory operations and "natve debigging" works. I very good >book on this was a book i read years ago from Joh Robbins, >named "Debugging Windows". Highly recommended! I dont >know whether this book is still up to date (new releases?) >but even for today it is a very good book for learning how >basic debugging works... > > >Regards > >K. Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Alexander Grigoriev on 10 Mar 2008 00:38 "Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message news:vcg8t313hoeo4q5e39sff6rimjkltgbfe8(a)4ax.com... > > Assembly code is a last resort. The last three times I used it, I used it > because I > needed to execute very low-level code, such as CPUID and RDTSC, not > supported in the C > language. > And now there are __rdtsc and __cpuid intrinsics.
From: Joseph M. Newcomer on 10 Mar 2008 09:21
Which are important, because the x64 compilers do not support assembly code insertions. joe On Sun, 9 Mar 2008 21:38:52 -0700, "Alexander Grigoriev" <alegr(a)earthlink.net> wrote: > >"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message >news:vcg8t313hoeo4q5e39sff6rimjkltgbfe8(a)4ax.com... >> >> Assembly code is a last resort. The last three times I used it, I used it >> because I >> needed to execute very low-level code, such as CPUID and RDTSC, not >> supported in the C >> language. >> > >And now there are __rdtsc and __cpuid intrinsics. > Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm |