Re: My 66000 and High word facility

Liste des GroupesRevenir à c arch 
Sujet : Re: My 66000 and High word facility
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 11. Aug 2024, 20:59:51
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v9b57p$2rkrq$1@dont-email.me>
References : 1 2 3 4
User-Agent : Mozilla Thunderbird
On 8/11/2024 9:33 AM, Anton Ertl wrote:
Brett <ggtgp@yahoo.com> writes:
The lack of CPU’s with 64 registers is what makes for a market, that 4%
that could benefit have no options to pick from.
 They had:
 SPARC: Ok, only 32 GPRs available at a time, but more in hardware
through the Window mechanism.
 AMD29K: IIRC a 128-register stack and 64 additional registers
 IA-64: 128 GPRs and 128 FPRs with register stack and rotating register
files to make good use of them.
 The additional registers obviously did not give these architectures a
decisive advantage.
 When ARM designed A64, when the RISC-V people designed RISC-V, and
when Intel designed APX, each of them had the opportinity to go for 64
GPRs, but they decided not to.  Apparently the benefits do not
outweigh the disadvantages.
 
In my experience:
   For most normal code, the advantage of 64 GPRs is minimal;
   But, there is some code, where it does have an advantage.
     Mostly involving big loops with lots of variables.
Sometimes, it is preferable to be able to map functions entirely to registers, and 64 does increase the probability of being able to do so (though, neither achieves 100% of functions; and functions which map entirely to GPRs with 32 will not see an advantage with 64).
Well, and to some extent the compiler needs to be selective about which functions it allows to use all of the registers, since in some cases a situation can come up where the saving/restoring more registers in the prolog/epilog can cost more than the associated register spills.
But, have noted that 32 GPRs can get clogged up pretty quickly when using them for FP-SIMD and similar (if working with 128-bit vectors as register pairs); or otherwise when working with 128-bit data as pairs.
Similarly, one can't fit a 4x4 matrix multiply entirely in 32 GPRs, but can in 64 GPRs. Where it takes 8 registers to hold a 4x4 Binary32 matrix, and 16 registers to perform a matrix-transpose, ...
Granted, arguably, doing a matrix-multiply directly in registers using SIMD ops is a bit niche (traditional option being to use scalar operations and fetch numbers from memory using "for()" loops, but this is slower). Most of the programs don't need fast MatMult though.
Annoyingly, it has led to my ISA fragmenting into two variants:
   Baseline: Primarily 32 GPR, 16/32/64/96 encoding;
     Supports R32..R63 for only a subset of the ISA for 32-bit ops.
     For ops outside this subset, needs 64-bit encodings in these cases.
   XG2: Supports R32..R63 everywhere, but loses 16-bit ops.
     By itself, would be easier to decode than Baseline,
       as it drops a bunch of wonky edge cases.
     Though, some cases were dropped from Baseline when XG2 was added.
       "Op40x2" was dropped as it was hair and became mostly moot.
Then, a common subset exists known as Fix32, which can be decoded in both Baseline and XG2 Mode, but only has access to R0..R31.
Well, and a 3rd sub-variant:
   XG2RV: Uses XG2's encodings but RISC-V's register space.
     R0..R31 are X0..X31;
     R32..R63 are F0..F31.
Arguable main use-case for XG2RV mode is for ASM blobs intended to be called natively from RISC-V mode; but...
It is debatable whether such an operating mode actually makes sense, and it might have made more sense to simply fake it in the ASM parser:
   ADD R24, R25, R26  //Uses BJX2 register numbering.
   ADD X14, X15, X16  //Uses RISC-V register remapping.
Likely, as a sub-mode of either Baseline or XG2 Mode.
Since, the register remapping scheme is known as part of the ISA spec, it could be done in the assembler.
It is possible that XG2RV mode may eventually be dropped due to "lack of relevance".
Well, and similarly any ABI thunks would need to be done in Baseline or XG2 mode, since neither RV mode nor XG2RV Mode has access to all the registers used for argument passing in BJX2.
In this case, RISC-V mode only has ~ 26 GPRs (the remaining 6, X0..X5, being SPRs or CRs). In the RV modes R0/R4/R5/R14 are inaccessible.
Well, and likewise one wants to limit the number of inter-ISA branches, as the branch-predictor can't predict these, and they need a full pipeline flush (a few extra cycles are needed to make sure the L1 I$ is fetching in the correct mode). Technically also the L1 I$ needs to flush any cache-lines which were fetched in a different mode (the I$ uses internal tag-bits to to figure out things like instruction length and bundling and to try to help with Superscalar in RV mode, *; mostly for timing/latency reasons, ...).
*: The way the BJX2 core deals with superscalar being to essentially pretend as-if RV64 had WEX flag bits, which can be synthesized partly when fetching cache lines (putting some of the latency in the I$ Miss handling, rather than during instruction-fetch). In the ID stage, it sees the longer PC step and infers that two instructions are being decoded as superscalar.
...

Where is your 4% number coming from?
 
I guess it could make sense, arguably, to try to come up with test cases to try to get a quantitative measurement of the effect of 64 GPRs for programs which can make effective use of them...
Would be kind of a pain to test as 64 GPR programs couldn't run on a kernel built in 32 GPR mode, but TKRA-GL runs most of its backend in kernel-space (and is the main thing in my case that seems to benefit from 64 GPRs).
But, technically, a 32 GPR kernel couldn't run RISC-V programs either.
So, would likely need to switch GLQuake and similar over to baseline mode (and probably messing with "timedemo").
Checking, as-is, timedemo results for "demo1" are "969 frames 150.5 seconds 6.4 fps", but this is with my experimental FP8U HDR mode (would be faster with RGB555 LDR), at 50 MHz.
GLQuake, LDR RGB555 mode: "969 frames 119.0 seconds 8.1 fps".
But, yeah, both are with builds that use 64 GPRs.
Software Quake: "969 frames 147.4 seconds 6.6 fps"
Software Quake (RV64G): "969 frames 157.3 seconds 6.2 fps"
Not going to bother with GLQuake in RISC-V mode, would likely take a painfully long time.
Well, decided to run this test anyways:
   "969 frames 687.3 seconds 1.4 fps"
IOW: TKRA-GL runs horribly bad in RV64G mode (and not much can be done to make it fast within the limits of RV64G). Though, this is with it running GL entirely in RV64 mode (it might fare better as a userland application where the GL backend is running in kernel space in BJX2 mode).
Though, much of this is likely due more to RV64G's lack of SIMD and similar, rather than due to having fewer GPRs.
...

- anton

Date Sujet#  Auteur
10 Aug 24 * My 66000 and High word facility94Brett
10 Aug 24 +* Re: My 66000 and High word facility92MitchAlsup1
11 Aug 24 i`* Re: My 66000 and High word facility91Brett
11 Aug 24 i +- Re: My 66000 and High word facility1Thomas Koenig
11 Aug 24 i +* Re: My 66000 and High word facility61Anton Ertl
11 Aug 24 i i+* Re: My 66000 and High word facility20Brett
12 Aug 24 i ii`* Re: My 66000 and High word facility19Anton Ertl
12 Aug 24 i ii +* Re: My 66000 and High word facility17MitchAlsup1
12 Aug 24 i ii i`* Re: My 66000 and High word facility16BGB
12 Aug 24 i ii i `* Re: My 66000 and High word facility15MitchAlsup1
12 Aug 24 i ii i  `* Re: My 66000 and High word facility14BGB
13 Aug 24 i ii i   `* Re: My 66000 and High word facility13MitchAlsup1
13 Aug 24 i ii i    `* Re: My 66000 and High word facility12BGB
13 Aug 24 i ii i     `* Re: My 66000 and High word facility11MitchAlsup1
13 Aug 24 i ii i      `* Re: My 66000 and High word facility10BGB
13 Aug 24 i ii i       `* Re: My 66000 and High word facility9MitchAlsup1
13 Aug 24 i ii i        +* Re: My 66000 and High word facility5Thomas Koenig
13 Aug 24 i ii i        i+* Re: My 66000 and High word facility3MitchAlsup1
14 Aug 24 i ii i        ii`* Re: My 66000 and High word facility2Thomas Koenig
14 Aug 24 i ii i        ii `- Re: My 66000 and High word facility1MitchAlsup1
14 Aug 24 i ii i        i`- Re: My 66000 and High word facility1BGB
14 Aug 24 i ii i        `* Re: My 66000 and High word facility3BGB
15 Aug 24 i ii i         `* Re: My 66000 and High word facility2MitchAlsup1
15 Aug 24 i ii i          `- Re: My 66000 and High word facility1BGB
15 Aug 24 i ii `- Re: My 66000 and High word facility1MitchAlsup1
11 Aug 24 i i+- Re: My 66000 and High word facility1Niklas Holsti
11 Aug 24 i i+* Re: My 66000 and High word facility31BGB
12 Aug 24 i ii`* Re: My 66000 and High word facility30Brett
12 Aug 24 i ii +* Re: My 66000 and High word facility2Terje Mathisen
16 Oct 24 i ii i`- Re: My 66000 and High word facility1Paul A. Clayton
15 Aug 24 i ii +* Re: My 66000 and High word facility25MitchAlsup1
15 Aug 24 i ii i`* Re: My 66000 and High word facility24Brett
15 Aug 24 i ii i `* Re: My 66000 and High word facility23Brett
15 Aug 24 i ii i  `* Re: My 66000 and High word facility22Stephen Fuld
16 Aug 24 i ii i   `* Re: My 66000 and High word facility21Brett
16 Aug 24 i ii i    +- Re: My 66000 and High word facility1Brett
16 Aug 24 i ii i    `* Re: My 66000 and High word facility19MitchAlsup1
17 Aug 24 i ii i     `* Re: My 66000 and High word facility18Brett
17 Aug 24 i ii i      +* Re: My 66000 and High word facility8Thomas Koenig
17 Aug 24 i ii i      i`* Re: My 66000 and High word facility7Brett
18 Aug 24 i ii i      i +* Re: My 66000 and High word facility5Thomas Koenig
18 Aug 24 i ii i      i i`* Re: My 66000 and High word facility4MitchAlsup1
18 Aug 24 i ii i      i i +- Re: My 66000 and High word facility1Brett
18 Aug 24 i ii i      i i `* Re: My 66000 and High word facility2Thomas Koenig
19 Aug 24 i ii i      i i  `- Re: My 66000 and High word facility1BGB
19 Aug 24 i ii i      i `- Re: My 66000 and High word facility1BGB
17 Aug 24 i ii i      `* Re: My 66000 and High word facility9MitchAlsup1
17 Aug 24 i ii i       `* Re: My 66000 and High word facility8Brett
18 Aug 24 i ii i        +* Re: My 66000 and High word facility2MitchAlsup1
18 Aug 24 i ii i        i`- Re: My 66000 and High word facility1Brett
19 Aug 24 i ii i        `* Re: My 66000 and High word facility5Stefan Monnier
19 Aug 24 i ii i         +- Re: My 66000 and High word facility1BGB
19 Aug 24 i ii i         `* Re: My 66000 and High word facility3MitchAlsup1
19 Aug 24 i ii i          +- Re: My 66000 and High word facility1Thomas Koenig
20 Aug 24 i ii i          `- Re: My 66000 and High word facility1Michael S
20 Aug 24 i ii `* Re: My 66000 and High word facility2Stefan Monnier
21 Aug 24 i ii  `- Re: My 66000 and High word facility1BGB
15 Aug 24 i i`* Re: My 66000 and High word facility8MitchAlsup1
15 Aug 24 i i +* Re: My 66000 and High word facility3Anton Ertl
15 Aug 24 i i i`* Re: My 66000 and High word facility2Michael S
15 Aug 24 i i i `- Re: My 66000 and High word facility1MitchAlsup1
15 Aug 24 i i `* Re: My 66000 and High word facility4Michael S
15 Aug 24 i i  `* Re: My 66000 and High word facility3Stephen Fuld
15 Aug 24 i i   `* Re: My 66000 and High word facility2Michael S
15 Aug 24 i i    `- Re: My 66000 and High word facility1MitchAlsup1
19 Aug 24 i `* Re: My 66000 and High word facility28MitchAlsup1
19 Aug 24 i  `* Re: My 66000 and High word facility27Brett
19 Aug 24 i   `* Re: My 66000 and High word facility26MitchAlsup1
20 Aug 24 i    +* Re: My 66000 and High word facility3Brett
20 Aug 24 i    i`* Re: My 66000 and High word facility2MitchAlsup1
20 Aug 24 i    i `- Re: My 66000 and High word facility1Brett
20 Aug 24 i    `* number of registers (was: My 66000 and High word facility)22Anton Ertl
20 Aug 24 i     `* Re: number of registers21MitchAlsup1
20 Aug 24 i      +* Re: number of registers13Michael S
20 Aug 24 i      i`* Re: number of registers12MitchAlsup1
21 Aug 24 i      i +* Re: number of registers6Brett
21 Aug 24 i      i i+* Re: number of registers4MitchAlsup1
21 Aug 24 i      i ii+* Re: number of registers2Brett
23 Aug 24 i      i iii`- Re: number of registers1Brett
22 Aug 24 i      i ii`- Re: number of registers1Stephen Fuld
21 Aug 24 i      i i`- Re: number of registers1Anton Ertl
21 Aug 24 i      i `* Re: number of registers5Anton Ertl
21 Aug 24 i      i  +* Re: number of registers3Stephen Fuld
21 Aug 24 i      i  i`* Re: number of registers2Anton Ertl
21 Aug 24 i      i  i `- Re: number of registers1Stephen Fuld
21 Aug 24 i      i  `- Re: number of registers1Anton Ertl
20 Aug 24 i      `* Re: number of registers7MitchAlsup1
21 Aug 24 i       `* Re: number of registers6Anton Ertl
21 Aug 24 i        +* Re: number of registers3Michael S
21 Aug 24 i        i`* Re: number of registers2Anton Ertl
21 Aug 24 i        i `- Re: number of registers1Michael S
21 Aug 24 i        `* Re: number of registers2MitchAlsup1
21 Aug 24 i         `- Re: number of registers1Michael S
10 Aug 24 `- Re: My 66000 and High word facility1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal