Sujet : Re: Misc: BGBCC targeting RV64G, initial results...
De : paaronclayton (at) *nospam* gmail.com (Paul A. Clayton)
Groupes : comp.archDate : 28. Sep 2024, 02:44:10
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vdcbe5$1s6so$1@dont-email.me>
References : 1 2
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.0
On 9/27/24 11:52 AM, MitchAlsup1 wrote:
On Fri, 27 Sep 2024 9:46:01 +0000, BGB wrote:
[snip]
RV's selection of 3R compare ops is more limited:
RV: SLT, SLTU
BJX2: CMPEQ, CMPNE, CMPGT, CMPGE, CMPHI, CMPHS, TST, NTST
A lot of these cases require a multi-op sequence to implement with just
SLT and SLTU.
My 55000 can do:: 1 < i && i <= MAX in 1 instruction
Did you mean "0 < i && i <= MAX" (Fortran-IN comparison result) or
"1 <= i && i <= MAX" (which is the same, for unsigned)? Or am I
missing a capability of My 66000?
Itanium comparison instructions were interesting in that the one
bit result (which was stored in two condition registers, one
storing the compliment) could be ANDed or ORed with another
condition register as part of the instruction. This merging
apparently allowed some complex comparison merging to be done
in one cycle with multiple compare instructions.
Itanium's method may not be the best way of merging conditions,
but there may be some benefit from not using sequential branches
to perform this function (or SHIFT and OR/AND to merge a bit in a
comparison result) . (I suppose a microarchitecture could use one
BTB entry for both branches, but detecting such seems a fair
amount of work for an uncommon case.)
Another weird concept that came to mind would be providing an
8-bit (e.g.) field that enumerated a set of interesting
conditions. Combined with a Table Transfer instruction (which
jumps either to an indexed table position for execution or uses
the indexed table entry as a jump target address) this _might_
be useful, though I doubt there are many cases where a general
condition test could set a dense enumeration of cases that are
of interest for the specific use. Yet perhaps mentioning this
weirdness might stir someone else's *useful* creativity.
(For small local switch-like jumps, using 16-bit table entries
might be practical starting immediately after the instruction.
A 64-bit immediate would provide four targets with the next
word being a "free" target specification. 8-bit offsets might
be practical for small switches, especially if accumulated (i.e.,
as if performing a sequence of short branches); adding four
8-bit values would be fairly fast. Yes, more weirdness.)