Sujet : Re: What integer C type to use
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 12. Mar 2024, 18:18:36
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <19da68f1b874758d42b64203741c325b@www.novabbs.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
User-Agent : Rocksolid Light
Michael S wrote:
On Tue, 12 Mar 2024 11:14:47 +0100
David Brown <david.brown@hesbynett.no> wrote:
You use the word vector where you mean SIMD.
Yes, I was using the word somewhat interchangeably, as I was talking
in general terms. Perhaps I should have been more precise. I know
this thread talked about "Cray style vectors", but I thought this
branch had diverged - I don't know anywhere near enough about the
details of Cray machines to talk much about them.
>
Even for Cray/NEC-style vectors, the same throughput for different
precision is not an universal property. Cray's and NEC's vector
processors happen to be designed like that, but one can easily imagine
vector processors of similar style that have 2 or even 3 times higher
throughput for SP vs DP.
I personally never encountered such machines, but would be surprised if
it were never built and sold back by one or another usual suspect (may
be, Fujitsu?) in days when designers liked Cray's style.
While theoretically possible, they did not do this because both halves
of a 2×SP would not arrive from memory necessarily simultaneously.
{Consider a gather load you need a vector of addresses 2× as long
for pairs of SP going into a single vector register element.}
Which, of course, leaves the question of what property makes vector
processor Cray-style. Just having ALU/FPU several times narrower than
VR is, IMHO, not enough to be considered Cray-style.
That property is that the length of the vector register is chosen to
absorb the latency to memory. SMID is too short to have this property.
In my book, the critical distinction is that at least one size of
partial (chopped) none-load-store vector operations has higher
throughput (and hopefully, but not necessarily lower latency) than full
vector operations of the same type.