Re: My 66000 and High word facility

Liste des GroupesRevenir à c arch 
Sujet : Re: My 66000 and High word facility
De : ggtgp (at) *nospam* yahoo.com (Brett)
Groupes : comp.arch
Date : 12. Aug 2024, 04:23:00
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v9brm4$33kmd$1@dont-email.me>
References : 1 2 3 4 5
User-Agent : NewsTap/5.5 (iPad)
BGB <cr88192@gmail.com> wrote:
On 8/11/2024 9:33 AM, Anton Ertl wrote:
Brett <ggtgp@yahoo.com> writes:
The lack of CPU’s with 64 registers is what makes for a market, that 4%
that could benefit have no options to pick from.
 
They had:
 
SPARC: Ok, only 32 GPRs available at a time, but more in hardware
through the Window mechanism.
 
AMD29K: IIRC a 128-register stack and 64 additional registers
 
IA-64: 128 GPRs and 128 FPRs with register stack and rotating register
files to make good use of them.
 
The additional registers obviously did not give these architectures a
decisive advantage.
 
When ARM designed A64, when the RISC-V people designed RISC-V, and
when Intel designed APX, each of them had the opportinity to go for 64
GPRs, but they decided not to.  Apparently the benefits do not
outweigh the disadvantages.
 
 
In my experience:
  For most normal code, the advantage of 64 GPRs is minimal;
  But, there is some code, where it does have an advantage.
    Mostly involving big loops with lots of variables.
 
 
Sometimes, it is preferable to be able to map functions entirely to
registers, and 64 does increase the probability of being able to do so
(though, neither achieves 100% of functions; and functions which map
entirely to GPRs with 32 will not see an advantage with 64).
 
Well, and to some extent the compiler needs to be selective about which
functions it allows to use all of the registers, since in some cases a
situation can come up where the saving/restoring more registers in the
prolog/epilog can cost more than the associated register spills.


Another benefit of 64 registers is more inlining removing calls.

A call can cause a significant amount of garbage code all around that call,
as it splits your function and burns registers that would otherwise get
used.

I can understand the reluctance to go to 6 bit register specifiers, it
burns up your opcode space and makes encoding everything more difficult.
But today that is an unserviced market which will get customers to give you
a look. Put out some vapor ware and see what customers say.


But, have noted that 32 GPRs can get clogged up pretty quickly when
using them for FP-SIMD and similar (if working with 128-bit vectors as
register pairs); or otherwise when working with 128-bit data as pairs.
 
Similarly, one can't fit a 4x4 matrix multiply entirely in 32 GPRs, but
can in 64 GPRs. Where it takes 8 registers to hold a 4x4 Binary32
matrix, and 16 registers to perform a matrix-transpose, ...
 
Granted, arguably, doing a matrix-multiply directly in registers using
SIMD ops is a bit niche (traditional option being to use scalar
operations and fetch numbers from memory using "for()" loops, but this
is slower). Most of the programs don't need fast MatMult though.
 
 
 
Annoyingly, it has led to my ISA fragmenting into two variants:
  Baseline: Primarily 32 GPR, 16/32/64/96 encoding;
    Supports R32..R63 for only a subset of the ISA for 32-bit ops.
    For ops outside this subset, needs 64-bit encodings in these cases.
  XG2: Supports R32..R63 everywhere, but loses 16-bit ops.
    By itself, would be easier to decode than Baseline,
      as it drops a bunch of wonky edge cases.
    Though, some cases were dropped from Baseline when XG2 was added.
      "Op40x2" was dropped as it was hair and became mostly moot.
 
Then, a common subset exists known as Fix32, which can be decoded in
both Baseline and XG2 Mode, but only has access to R0..R31.
 
 
Well, and a 3rd sub-variant:
  XG2RV: Uses XG2's encodings but RISC-V's register space.
    R0..R31 are X0..X31;
    R32..R63 are F0..F31.
 
Arguable main use-case for XG2RV mode is for ASM blobs intended to be
called natively from RISC-V mode; but...
 
It is debatable whether such an operating mode actually makes sense, and
it might have made more sense to simply fake it in the ASM parser:
  ADD R24, R25, R26  //Uses BJX2 register numbering.
  ADD X14, X15, X16  //Uses RISC-V register remapping.
 
Likely, as a sub-mode of either Baseline or XG2 Mode.
Since, the register remapping scheme is known as part of the ISA spec,
it could be done in the assembler.
 
It is possible that XG2RV mode may eventually be dropped due to "lack of
relevance".
 
 
Well, and similarly any ABI thunks would need to be done in Baseline or
XG2 mode, since neither RV mode nor XG2RV Mode has access to all the
registers used for argument passing in BJX2.
In this case, RISC-V mode only has ~ 26 GPRs (the remaining 6, X0..X5,
being SPRs or CRs). In the RV modes R0/R4/R5/R14 are inaccessible.
 
 
Well, and likewise one wants to limit the number of inter-ISA branches,
as the branch-predictor can't predict these, and they need a full
pipeline flush (a few extra cycles are needed to make sure the L1 I$ is
fetching in the correct mode). Technically also the L1 I$ needs to flush
any cache-lines which were fetched in a different mode (the I$ uses
internal tag-bits to to figure out things like instruction length and
bundling and to try to help with Superscalar in RV mode, *; mostly for
timing/latency reasons, ...).
 
 
*: The way the BJX2 core deals with superscalar being to essentially
pretend as-if RV64 had WEX flag bits, which can be synthesized partly
when fetching cache lines (putting some of the latency in the I$ Miss
handling, rather than during instruction-fetch). In the ID stage, it
sees the longer PC step and infers that two instructions are being
decoded as superscalar.
 
...
 
 
Where is your 4% number coming from?
 
 
 
I guess it could make sense, arguably, to try to come up with test cases
to try to get a quantitative measurement of the effect of 64 GPRs for
programs which can make effective use of them...
 
Would be kind of a pain to test as 64 GPR programs couldn't run on a
kernel built in 32 GPR mode, but TKRA-GL runs most of its backend in
kernel-space (and is the main thing in my case that seems to benefit
from 64 GPRs).
 
But, technically, a 32 GPR kernel couldn't run RISC-V programs either.
 
 
So, would likely need to switch GLQuake and similar over to baseline
mode (and probably messing with "timedemo").
 
 
 
 
Checking, as-is, timedemo results for "demo1" are "969 frames 150.5
seconds 6.4 fps", but this is with my experimental FP8U HDR mode (would
be faster with RGB555 LDR), at 50 MHz.
 
GLQuake, LDR RGB555 mode: "969 frames 119.0 seconds 8.1 fps".
 
But, yeah, both are with builds that use 64 GPRs.
 
 
Software Quake: "969 frames 147.4 seconds 6.6 fps"
Software Quake (RV64G): "969 frames 157.3 seconds 6.2 fps"
 
Not going to bother with GLQuake in RISC-V mode, would likely take a
painfully long time.
 
Well, decided to run this test anyways:
  "969 frames 687.3 seconds 1.4 fps"
 
 
IOW: TKRA-GL runs horribly bad in RV64G mode (and not much can be done
to make it fast within the limits of RV64G). Though, this is with it
running GL entirely in RV64 mode (it might fare better as a userland
application where the GL backend is running in kernel space in BJX2 mode).
 
Though, much of this is likely due more to RV64G's lack of SIMD and
similar, rather than due to having fewer GPRs.



Date Sujet#  Auteur
10 Aug 24 * My 66000 and High word facility93Brett
10 Aug 24 +* Re: My 66000 and High word facility91MitchAlsup1
11 Aug 24 i`* Re: My 66000 and High word facility90Brett
11 Aug 24 i +- Re: My 66000 and High word facility1Thomas Koenig
11 Aug 24 i +* Re: My 66000 and High word facility60Anton Ertl
11 Aug 24 i i+* Re: My 66000 and High word facility20Brett
12 Aug 24 i ii`* Re: My 66000 and High word facility19Anton Ertl
12 Aug 24 i ii +* Re: My 66000 and High word facility17MitchAlsup1
12 Aug 24 i ii i`* Re: My 66000 and High word facility16BGB
12 Aug 24 i ii i `* Re: My 66000 and High word facility15MitchAlsup1
12 Aug 24 i ii i  `* Re: My 66000 and High word facility14BGB
13 Aug 24 i ii i   `* Re: My 66000 and High word facility13MitchAlsup1
13 Aug 24 i ii i    `* Re: My 66000 and High word facility12BGB
13 Aug 24 i ii i     `* Re: My 66000 and High word facility11MitchAlsup1
13 Aug 24 i ii i      `* Re: My 66000 and High word facility10BGB
13 Aug 24 i ii i       `* Re: My 66000 and High word facility9MitchAlsup1
13 Aug 24 i ii i        +* Re: My 66000 and High word facility5Thomas Koenig
13 Aug 24 i ii i        i+* Re: My 66000 and High word facility3MitchAlsup1
14 Aug 24 i ii i        ii`* Re: My 66000 and High word facility2Thomas Koenig
14 Aug 24 i ii i        ii `- Re: My 66000 and High word facility1MitchAlsup1
14 Aug 24 i ii i        i`- Re: My 66000 and High word facility1BGB
14 Aug 24 i ii i        `* Re: My 66000 and High word facility3BGB
15 Aug 24 i ii i         `* Re: My 66000 and High word facility2MitchAlsup1
15 Aug 24 i ii i          `- Re: My 66000 and High word facility1BGB
15 Aug 24 i ii `- Re: My 66000 and High word facility1MitchAlsup1
11 Aug 24 i i+- Re: My 66000 and High word facility1Niklas Holsti
11 Aug 24 i i+* Re: My 66000 and High word facility30BGB
12 Aug 24 i ii`* Re: My 66000 and High word facility29Brett
12 Aug 24 i ii +- Re: My 66000 and High word facility1Terje Mathisen
15 Aug 24 i ii +* Re: My 66000 and High word facility25MitchAlsup1
15 Aug 24 i ii i`* Re: My 66000 and High word facility24Brett
15 Aug 24 i ii i `* Re: My 66000 and High word facility23Brett
15 Aug 24 i ii i  `* Re: My 66000 and High word facility22Stephen Fuld
16 Aug 24 i ii i   `* Re: My 66000 and High word facility21Brett
16 Aug 24 i ii i    +- Re: My 66000 and High word facility1Brett
16 Aug 24 i ii i    `* Re: My 66000 and High word facility19MitchAlsup1
17 Aug 24 i ii i     `* Re: My 66000 and High word facility18Brett
17 Aug 24 i ii i      +* Re: My 66000 and High word facility8Thomas Koenig
17 Aug 24 i ii i      i`* Re: My 66000 and High word facility7Brett
18 Aug 24 i ii i      i +* Re: My 66000 and High word facility5Thomas Koenig
18 Aug 24 i ii i      i i`* Re: My 66000 and High word facility4MitchAlsup1
18 Aug 24 i ii i      i i +- Re: My 66000 and High word facility1Brett
18 Aug 24 i ii i      i i `* Re: My 66000 and High word facility2Thomas Koenig
19 Aug 24 i ii i      i i  `- Re: My 66000 and High word facility1BGB
19 Aug 24 i ii i      i `- Re: My 66000 and High word facility1BGB
17 Aug 24 i ii i      `* Re: My 66000 and High word facility9MitchAlsup1
17 Aug 24 i ii i       `* Re: My 66000 and High word facility8Brett
18 Aug 24 i ii i        +* Re: My 66000 and High word facility2MitchAlsup1
18 Aug 24 i ii i        i`- Re: My 66000 and High word facility1Brett
19 Aug 24 i ii i        `* Re: My 66000 and High word facility5Stefan Monnier
19 Aug 24 i ii i         +- Re: My 66000 and High word facility1BGB
19 Aug 24 i ii i         `* Re: My 66000 and High word facility3MitchAlsup1
19 Aug 24 i ii i          +- Re: My 66000 and High word facility1Thomas Koenig
20 Aug 24 i ii i          `- Re: My 66000 and High word facility1Michael S
20 Aug 24 i ii `* Re: My 66000 and High word facility2Stefan Monnier
21 Aug 24 i ii  `- Re: My 66000 and High word facility1BGB
15 Aug 24 i i`* Re: My 66000 and High word facility8MitchAlsup1
15 Aug 24 i i +* Re: My 66000 and High word facility3Anton Ertl
15 Aug 24 i i i`* Re: My 66000 and High word facility2Michael S
15 Aug 24 i i i `- Re: My 66000 and High word facility1MitchAlsup1
15 Aug 24 i i `* Re: My 66000 and High word facility4Michael S
15 Aug 24 i i  `* Re: My 66000 and High word facility3Stephen Fuld
15 Aug 24 i i   `* Re: My 66000 and High word facility2Michael S
15 Aug 24 i i    `- Re: My 66000 and High word facility1MitchAlsup1
19 Aug 24 i `* Re: My 66000 and High word facility28MitchAlsup1
19 Aug 24 i  `* Re: My 66000 and High word facility27Brett
19 Aug 24 i   `* Re: My 66000 and High word facility26MitchAlsup1
20 Aug 24 i    +* Re: My 66000 and High word facility3Brett
20 Aug 24 i    i`* Re: My 66000 and High word facility2MitchAlsup1
20 Aug 24 i    i `- Re: My 66000 and High word facility1Brett
20 Aug 24 i    `* number of registers (was: My 66000 and High word facility)22Anton Ertl
20 Aug 24 i     `* Re: number of registers21MitchAlsup1
20 Aug 24 i      +* Re: number of registers13Michael S
20 Aug 24 i      i`* Re: number of registers12MitchAlsup1
21 Aug 24 i      i +* Re: number of registers6Brett
21 Aug 24 i      i i+* Re: number of registers4MitchAlsup1
21 Aug 24 i      i ii+* Re: number of registers2Brett
23 Aug 24 i      i iii`- Re: number of registers1Brett
22 Aug 24 i      i ii`- Re: number of registers1Stephen Fuld
21 Aug 24 i      i i`- Re: number of registers1Anton Ertl
21 Aug 24 i      i `* Re: number of registers5Anton Ertl
21 Aug 24 i      i  +* Re: number of registers3Stephen Fuld
21 Aug 24 i      i  i`* Re: number of registers2Anton Ertl
21 Aug 24 i      i  i `- Re: number of registers1Stephen Fuld
21 Aug 24 i      i  `- Re: number of registers1Anton Ertl
20 Aug 24 i      `* Re: number of registers7MitchAlsup1
21 Aug 24 i       `* Re: number of registers6Anton Ertl
21 Aug 24 i        +* Re: number of registers3Michael S
21 Aug 24 i        i`* Re: number of registers2Anton Ertl
21 Aug 24 i        i `- Re: number of registers1Michael S
21 Aug 24 i        `* Re: number of registers2MitchAlsup1
21 Aug 24 i         `- Re: number of registers1Michael S
10 Aug 24 `- Re: My 66000 and High word facility1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal