Liste des Groupes | Revenir à c arch |
Brett wrote:David Brown <david.brown@hesbynett.no> wrote:Often you get the most efficient results by writing code clearly and
simply so that the compiler can understand it better and good object
code. This is particularly true if you want the same source to be used
on different targets or different variants of a target - few people can
track the instruction scheduling and timings on multiple processors
better than a good compiler. (And the few people who /can/ do that
spend their time chatting in comp.arch instead of writing code...) When
you do hand-made micro-optimisations, these can work against the
compiler and give poorer results overall.
I know of no example where hand optimized code does worse on a newer CPU.
A newer CPU with bigger OoOe will effectively unroll your code and schedule
it even better.
Not true:
My favorite benchmark program for 20+ years was Word Count, I
re-optimized that for every new x86 generation, and on the Pentium I got
it to run at 1.5 clock cycles per character (40 MB/s on a 60 MHz Pentium).
When the PentiumPro came out, it did a 10-20 cycle stall for every pair
of characters, so about an order of magnitude slower in cycle count.
(But only about 3X clock time due to being 200 instead of 60 MHz.)
It’s older lesser CPU’s where your hand optimized code might fail hard, andRight.
I know of few examples of that. None actually.
This is especially the case
when code is moved around with inlining, constant propagation,
unrolling, link-time optimisation, etc.
Long ago, it was a different matter - then compilers needed more help to
get good results. And compilers are far from perfect - there are still
times when "smart" code or assembly-like C is needed (such as when
taking advantage of some vector and SIMD facilities).
Terje
Les messages affichés proviennent d'usenet.