Bart <
bc@freeuk.com> wrote:
On 09/09/2024 01:29, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
No. It is essential for efficiency to have 32-bit types. On 32-bit
machines doing otherwise would add useless instructions to object
code. More precisly, really stupid compiler will generate useless
intructions even with my declarations, really smart one will
notice that variables fit in 32-bits and optimize accordingly.
But at least some gcc versions needed such declarations. Note
also that my version makes clear that there there is
symmetry (everything should be added using 64-bit precision),
you depend on promotion rules which creates visual asymetry
are requires reasoning to realize that meaning is symetric.
Your posted code used 64-bit aritmetic. The xext and c 32-bit variables
were used in loops where they need to be widened to 64 bits anyway. The
new value of c is set from a 32-bit result.
Well, at C level there is 64-bit type. The intent is that C compiler
should notice that the result is 32-bit + carry flag. Ideally
compiler should notice that c has only one bit and can keep it
in carry flag. On i386 comparison needed for loop control would
destroy carry flag, so there must be code using value of carry in
register and code to save carry to register. But one addition
of highs parts can be skipped. On 32-bit ARM compiler can use
special machine istructions and actually generated code which
is close to optimal.
(Have you tried 64-bit versions of xext, yext, c to see it it makes any
difference? I may try it myself if I can set up a suitable test, but I
can only test on a 64-bit machine.
I did test this and and when I tested 64-bit declarations generated
extra instructions. IIRC 32-bit ones gave good result (not extra
instructions). Re-checking now with gcc12 in 32-bit mode seem to
produce extra instructions. Maybe I remembered wrong, may there
is regression on i386.
Just extra remark: this is one of several routines which use
similar style and "the same" declarations. So even if in this
routne optimizaton does not work as intended, it made difference
in other routines and I want to keep declarations in all routines
consistent with each other.
Do you still have 32-bit machines around? I haven't been able to find
one for a decade and a half!)
I have a few old PC-s. In the best one currently power supply
does not work, but last time when I checked two other were
operational. I have a complete 32-bit Linux userland on
a machine with 64-bit kernel. So I can produce and run
32-bit binaries on this machine (just now I have no access
to this one).
I have also bunch of ARM boards, except for one
other are 32-bit, oldest one is from 2012, one or two were bought
few years ago. It seems that Raspberry Pi with 32-bit CPU
is still in shops and there were other brands. I do
not make much use of them, but I do use them from time to
time. Actually I have a lot of ARM boards, the ones I mention
before are "powerful" ones, with hunders MB RAM and at least
hundreds MHz clock. Other are microcontroller boards, those
are small, usually less than 1MB RAM, so not suitable for PC
class software. The small ones are 32-bit and new ones keep
appearing. Basically, at this size there is no motivation
to go to 64-bits.
Coming back to arithemtic routines, I plan to use 64-bit
units on 64-bit machines, and then otimization issue will
be the same as on 32-bit ones.
And if
you do not see benefits, well, this your loss.
Average number of local variables in a half-dozen C codebases I surveyed
was 3 variables per function. So I find it hard to see the point of
splitting them up into different scopes!
My style tends to produce more local variables than older style.
Some functions are big and in those there are most benefits.
But even if there is only 1 variable wrong/missing initialization
may be a problem. My style minimizes such issues.
-- Waldek Hebisch