Sujet : Re: My 66000 and High word facility
De : paaronclayton (at) *nospam* gmail.com (Paul A. Clayton)
Groupes : comp.archDate : 16. Oct 2024, 18:40:13
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <veotq0$2be5m$1@dont-email.me>
References : 1 2 3 4 5 6 7
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.0
On 8/12/24 2:22 AM, Terje Mathisen wrote:
Brett wrote:
[snip]
I can understand the reluctance to go to 6 bit register specifiers, it
burns up your opcode space and makes encoding everything more difficult.
But today that is an unserviced market which will get customers to give you
a look. Put out some vapor ware and see what customers say.
The solution (?) have always looked obvious to me: Some form of huffmann encoding of register specifiers, so that the most common ones (bottom 16 or 32) require just a small amount of space (as today), and then either a prefix or a suffix to provide extra bits when you want to use those higher register numbers. Mitch's CARRY sets up a single extra register for a set of operations, a WIDE prefix could contain two extra register bits for four registers over the next 2 or 3 instructions.
As long as this doesn't make the decoder a speed limiter, it would be zero cost for regular code and still quite cheap except for increasing code size by 33-50% for the inner loops of algorithms that need 64 or even 128 regs.
Fujitsu's SPARC64 VIIIfx had a Set XAR (eXtended Arithmetic
Register) instruction that provided three bits to four register
fields and a SIMD bit to up to two instructions.
(Of course x86-64 has a prefix that added one bit to two register
fields for one instruction. AVX512/AVX10 further extend the
register set to 32 entries.).
An alternative would be to have opcode bits be variably placed
such that the register fields are always in the same positions but
are truncated for some encodings (based on fixed
opcode information). I.e., the extra register field bits could be
interpreted as opcode bits or register bits (or perhaps immediate
bits).
Another possibility would be to have smaller special purpose
register fields that when expanded become general purpose. E.g., 8
address and 8 data registers might be expanded to 32 general
purpose registers.
The tradeoffs of contiguity of fields (register, opcode,
immediate) and consistency of interpretation would seem to depend
on criticality in the pipeline. Except for control flow
instructions, register names seem more critical than opcode.
Immediate fields are usually least critical. Relative control flow
is one exception; recognizing a branch and the offset early can be
beneficial. Similarly loads from a stable base (e.g., global
pointer, stack pointer, also signature caches) might start early
with a earlier known offset. Also, some immediate additions might
be merged early — comparable to trace cache optimizations but not
cached — or converted to a single dependency delay.
ADD R5 ← R5 #100;
BEZ R7 TARGET; // predicted not taken
ADD R5 ← R5 #32;
could be converted to
ADD R5 ← R1 #132;
(This would require recovering more slowly from an earlier
checkpoint on a branch misprediction or a means of inserting a SUB
R5 ← R5 #32 to correct the state.)
ADD R6 ← R5 #100;
ADD R5 ← R6 #32;
could be converted to
ADD R6 ← R5 #100;
ADD R5 ← R5 #132;
(Intel recently started optimizing this case.)
An argument might also be made that the operands for multiple
operations might be beneficially encoded to exploit variability
in the number of register operands, possibly including temporary
results and destructive operations. (Maybe not a strong or a
sensible argument, but an argument.☺)