Re: What integer C type to use

Liste des GroupesRevenir à c arch 
Sujet : Re: What integer C type to use
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.arch
Date : 12. Mar 2024, 20:00:46
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <db785354ebf90ee6f613fc9c39f8ca72@www.novabbs.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
User-Agent : Rocksolid Light
Michael S wrote:

On Tue, 12 Mar 2024 17:18:36 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:

Michael S wrote:
 
On Tue, 12 Mar 2024 11:14:47 +0100
David Brown <david.brown@hesbynett.no> wrote: 
 
 You use the word vector where you mean SIMD.   
 Yes, I was using the word somewhat interchangeably, as I was
talking in general terms.  Perhaps I should have been more
precise.  I know this thread talked about "Cray style vectors",
but I thought this branch had diverged - I don't know anywhere
near enough about the details of Cray machines to talk much about
them.
 
Even for Cray/NEC-style vectors, the same throughput for different
precision is not an universal property. Cray's and NEC's vector
processors happen to be designed like that, but one can easily
imagine vector processors of similar style that have 2 or even 3
times higher throughput for SP vs DP.
I personally never encountered such machines, but would be
surprised if it were never built and sold back by one or another
usual suspect (may be, Fujitsu?) in days when designers liked
Cray's style. 
 While theoretically possible, they did not do this because both halves
of a 2×SP would not arrive from memory necessarily simultaneously.
{Consider a gather load you need a vector of addresses 2× as long
for pairs of SP going into a single vector register element.}
 

Doctor, it hurts when I do this!
So, what prevents you from providing no gather with resolution
below 64 bits?
Well, then, you have SP values in a container than could hold 2 and you
don't get any SIMD speedup.

Which, of course, leaves the question of what property makes vector
processor Cray-style. Just having ALU/FPU several times narrower
than VR is, IMHO, not enough to be considered Cray-style. 
 That property is that the length of the vector register is chosen to
absorb the latency to memory. SMID is too short to have this property.
 

I don't like this definition at all.
For starter, what is "memory"? Does L1D cache count, or only L2 and
higher?
Those machines had no L1 or L2 (or LLC) caches. Consider the problems
for which they were designed--arrays as big as the memory (sometimes bigger !!) and processed over and over again with numerical algorithms.
Caches would simply miss on each memory reference (ignoring TLB effects)
With the caches never supplying data to the calculations why have them
at all ??

Then, what is "absorb" ?
Absorb means that the first data of a vector arrives and can start calculation before the last address of the memory reference goes out.
This, in turn, means that one can create a continuous stream of outbound addresses forever and thus cone can create a stream of
continuous calculations forever. {{Where 'forever' means thousands of cycles but no where near the lifetime of the universe.}} Now, obviously, this means the memory system has to be able to make
forward progress on all those memory accesses continuously.

                         Is the whole VR register file part of
absorbent or latency should be covered by one register?
A single register covers a single memory reference latency.
                                                        Is OoO machinery
part of absorbent?
The only OoO in the CRAYs was delivery of gather data back to the
vector register*. Scatter stores were sent out in order, as were the
addresses of the gather loads. (*) bank conflicts would delay conflicting accesses but not those
of other banks, creating an OoO effect of returning data. This was
re-ordered back to IO prior to forwarding data into calculation.

                    Is HW threading part of absorbent?
Absolutely not--none of the CRAYs did this--later XMPs and YMPs did
use lanes (SIMD with vector) but always did calculations in order
and always sent out addresses (and data when appropriate) in order.

                                                         And for any of
your possible answers I have my "Why?".
No harm in asking.

In my book, the critical distinction is that at least one size of
partial (chopped) none-load-store vector operations has higher
throughput (and hopefully, but not necessarily lower latency) than
full vector operations of the same type.

Date Sujet#  Auteur
11 Mar 24 * Re: What integer C type to use17MitchAlsup1
12 Mar 24 `* Re: What integer C type to use16David Brown
12 Mar 24  `* Re: What integer C type to use15Michael S
12 Mar 24   +* Re: What integer C type to use13MitchAlsup1
12 Mar 24   i`* Re: What integer C type to use12Michael S
12 Mar 24   i `* Re: What integer C type to use11MitchAlsup1
13 Mar 24   i  +* Re: What integer C type to use9Michael S
13 Mar 24   i  i+- Re: What integer C type to use1MitchAlsup1
13 Mar 24   i  i`* Re: What integer C type to use7Stefan Monnier
13 Mar 24   i  i +* Re: What integer C type to use5MitchAlsup1
15 Mar 24   i  i i`* Re: What integer C type to use4Paul A. Clayton
15 Mar 24   i  i i +- Re: What integer C type to use1Michael S
15 Mar 24   i  i i `* Re: What integer C type to use2MitchAlsup1
15 Mar 24   i  i i  `- Re: What integer C type to use1MitchAlsup1
14 Mar 24   i  i `- Re: What integer C type to use1Michael S
15 Mar 24   i  `- Re: What integer C type to use1Terje Mathisen
13 Mar 24   `- Re: What integer C type to use1Thomas Koenig

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal