Sujet : Re: What integer C type to use
De : already5chosen (at) *nospam* yahoo.com (Michael S)
Groupes : comp.archDate : 12. Mar 2024, 13:44:28
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <20240312144428.000063f5@yahoo.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
User-Agent : Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
On Tue, 12 Mar 2024 11:14:47 +0100
David Brown <
david.brown@hesbynett.no> wrote:
On 11/03/2024 20:56, MitchAlsup1 wrote:
David Brown wrote:
On 23/02/2024 20:55, MitchAlsup1 wrote:
Thomas Koenig wrote:
Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
I know no implementation of a 64-bit architecture where ALU
operations (except maybe division where present) is slower in
64 bits than in 32 bits. I would have chosen ILP64 at the
time, so I can only guess at their reasons:
A guess: people did not want sizeof(float) != sizeof(float).
float is cerainly faster than double.
>
Now, only in cache footprint. All your std operations take the
same amount
of cycles DP vs. SP. Excepting for cache footprint latency::
perf(DP) == perf(SP)
That's true - except when it is not.
It is not true when you are using vector instructions, and you can
do twice as many SP instructions as DP instructions in the same
register and instruction.
You use the word vector where you mean SIMD.
Yes, I was using the word somewhat interchangeably, as I was talking
in general terms. Perhaps I should have been more precise. I know
this thread talked about "Cray style vectors", but I thought this
branch had diverged - I don't know anywhere near enough about the
details of Cray machines to talk much about them.
>
Even for Cray/NEC-style vectors, the same throughput for different
precision is not an universal property. Cray's and NEC's vector
processors happen to be designed like that, but one can easily imagine
vector processors of similar style that have 2 or even 3 times higher
throughput for SP vs DP.
I personally never encountered such machines, but would be surprised if
it were never built and sold back by one or another usual suspect (may
be, Fujitsu?) in days when designers liked Cray's style.
Which, of course, leaves the question of what property makes vector
processor Cray-style. Just having ALU/FPU several times narrower than
VR is, IMHO, not enough to be considered Cray-style.
In my book, the critical distinction is that at least one size of
partial (chopped) none-load-store vector operations has higher
throughput (and hopefully, but not necessarily lower latency) than full
vector operations of the same type.
A CRAY-YMP doing single
would not be twice as fast because it was designed for 1 FADD + 1
FMUL + 2 LD + 1 ST per cycle continuously. CRAY-YMP is the epitome
of a Vector machine.
The alternative word would be short-vector instead of SIMD.
It is not true when you are using accelerators of various kinds,
such as graphics card processors.
And it is not true on smaller processors, such as in the embedded
world. On microcontrollers with floating point hardware for
single and double precision, SP can be up to twice as fast as DP.
And for many of the more popular microcontrollers, you can have
hardware SP but DP is done in software - the difference there is
clearly massive.
But for big processors doing non-vector adds and multiplies, DP
and SP are usually equal in clock cycles (other than memory and
cache effects).