Sujet : Re: What integer C type to use
De : david.brown (at) *nospam* hesbynett.no (David Brown)
Groupes : comp.archDate : 12. Mar 2024, 11:14:47
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <usp9un$7pij$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0
On 11/03/2024 20:56, MitchAlsup1 wrote:
David Brown wrote:
On 23/02/2024 20:55, MitchAlsup1 wrote:
Thomas Koenig wrote:
>
Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
I know no implementation of a 64-bit architecture where ALU operations
(except maybe division where present) is slower in 64 bits than in 32
bits. I would have chosen ILP64 at the time, so I can only guess at
their reasons:
>
A guess: people did not want sizeof(float) != sizeof(float). float
is cerainly faster than double.
>
Now, only in cache footprint. All your std operations take the same amount
of cycles DP vs. SP. Excepting for cache footprint latency:: perf(DP) == perf(SP)
>
That's true - except when it is not.
It is not true when you are using vector instructions, and you can do twice as many SP instructions as DP instructions in the same register and instruction.
You use the word vector where you mean SIMD.
Yes, I was using the word somewhat interchangeably, as I was talking in general terms. Perhaps I should have been more precise. I know this thread talked about "Cray style vectors", but I thought this branch had diverged - I don't know anywhere near enough about the details of Cray machines to talk much about them.
A CRAY-YMP doing single
would not be twice as fast because it was designed for 1 FADD + 1 FMUL + 2 LD + 1 ST per cycle continuously. CRAY-YMP is the epitome of a
Vector machine.
The alternative word would be short-vector instead of SIMD.
It is not true when you are using accelerators of various kinds, such as graphics card processors.
And it is not true on smaller processors, such as in the embedded world. On microcontrollers with floating point hardware for single and double precision, SP can be up to twice as fast as DP. And for many of the more popular microcontrollers, you can have hardware SP but DP is done in software - the difference there is clearly massive.
But for big processors doing non-vector adds and multiplies, DP and SP are usually equal in clock cycles (other than memory and cache effects).