Sujet : Re: What integer C type to use
De : paaronclayton (at) *nospam* gmail.com (Paul A. Clayton)
Groupes : comp.archDate : 15. Mar 2024, 04:56:04
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <ut0gsm$23m79$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.0
On 3/13/24 3:24 PM, MitchAlsup1 wrote:
Stefan Monnier wrote:
[snip]
So, short vectors have a fairly free hand at shuffling data across their
vector (e.g. bitmatrix transpose), and they can be
implemented/scheduled/dispatched just like any other instruction, but
the vector length tends to be severely limited and exposed all over
the place.
Consuming OpCode space like nobody's business.
Is that necessarily the case? Excluding the shuffle operations, I
think only loads and stores would need to have length specifiers.
Shuffle operations become much more expensive with larger
'vectors', so providing the same granularity of shuffle for larger
vectors seems questionable. (With scatter/gather, permute/shuffle
may be less useful anyway.)
The metadata would not even _have_ to be saved, though such would
be better than unnecessarily saving/restoring huge contexts and if
one can support variable-sized contexts one can support additional
metatdata.
If loads and stores were masked, the number of instruction
"encodings" would not need to be increased, but using zero-
extended masks to indicate smaller vector size seems less than
ideal.
Lane-based operations with different length operands would
presumably narrow the result (the later elements of the longer
operand would not be used) and allow the possible error to be
detected.
Unlike My 66000's VVM, different-sized elements would require
unpack/pack operations (where VVM needs to implicitly pack an
operand if it wanted to take advantage of reduced storage). This
would not increase instruction encodings as vector length is
increased. (VVM also provides sequential exceptions while SIMD is conceptually all at once.)
Am I missing something when I assume that lane-based (SIMD)
operations do not need size information in the instruction? The
extra metadata is not free (perhaps especially as that controls
execution at least for efficiency), but if opcode explosion is so
undesirable using metadata might be preferred.