Re: Misc: BGBCC targeting RV64G, initial results...

Liste des GroupesRevenir à c arch 
Sujet : Re: Misc: BGBCC targeting RV64G, initial results...
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 04. Oct 2024, 07:39:02
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vdo2i8$4mc6$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12
User-Agent : Mozilla Thunderbird
On 10/3/2024 10:51 PM, Robert Finch wrote:
On 2024-10-03 4:20 a.m., BGB wrote:
On 10/1/2024 5:00 AM, Robert Finch wrote:
On 2024-09-29 10:19 p.m., BGB wrote:
On 9/29/2024 2:11 PM, MitchAlsup1 wrote:
On Sat, 28 Sep 2024 4:30:12 +0000, BGB wrote:
>
On 9/27/2024 7:43 PM, MitchAlsup1 wrote:
On Fri, 27 Sep 2024 23:53:22 +0000, BGB wrote:
>
One of the reasons reservation stations became in vouge.
>
>
Possibly, but is a CPU feature rather than a compiler feature...
>
A good compiler should be able to make use of 98% of the instruction
set.
>
Yes, but a reservation station is not part of the ISA proper...
>
>
>
------------
>
Saw a video not too long ago where he was making code faster by undoing
a lot of loop unrolling, as the code was apparently spending more in I$
misses than it was gaining by being unrolled.
>
I noticed this in 1991 when we got Mc88120 simulator up and running.
GBOoO chips are <nearly> best served when there is the smallest number
of instructions.
>
>
Looking it up, seems the CPU in question (MIPS R4300) was:
   16K L1 I$ cache;
   8K L1 D$ cache;
   No L2 cache (but could be supported off-die);
   1-wide scalar, 32 or 64 bit
   Non pipelined FPU and multiplier;
   ...
>
>
Oddly, some amount of these older CPUs seem to have larger I$ than D$, whereas IME the D$ seems to have a higher miss rate (so is easier to justify it being bigger).
>
>
------------
>
In contrast, a jumbo prefix by itself does not make sense; its meaning
depends on the thing that being is prefixed. Also the decoder will
decode a jumbo prefix and suffix instruction at the same time.
>
How many bits does one of these jumbo prefixes consume ?
>
The prefix itself is 32 bits.
   In the context of XG3, it supplies 23 or 27 bits.
>
>
For RISC-V ops, they could supply 21 or 26 bits.
>
    23+10 = 33 (XG3)
    21+12 = 33 (RV op)
27+27+10 = 64 (XG3)
26+26+12 = 64 (RV op)
>
J27 could synthesize an immediate for non-immediate ops:
   27+6 = 33 (XG3)
   27+5 = 32 (RV)
>
>
For BJX2, the prefixes supply 24 bits (can be stretched to 27 bits in XG2).
   24+ 9/10=33 (Base)
   24+24+16=64 (Base)
   27+27+10=64 (XG2)
>
>
>
But, yeah, perhaps unsurprisingly, the RISC-V people are not so optimistic about the idea of jumbo prefixes...
>
>
Also apparently it seems "yeah, here is a prefix whose primary purpose is just to make the immediate field bigger for the following instruction" is not such an obvious or intuitive idea as I had thought.
>
>
Well, and people obsessing on what happens if an interrupt somehow occurs "between" the prefix and prefixed instruction.
>
One reason I prefer postfix immediates. They are much easier to work with. Interrupts do not cause issues. The instruction plus postfix can be faked to be treated as one giant instruction. The bits following the instruction are often already present on the cache line. It is just a matter of checking for a postfix when decoding the immediate constants.
Q+ had postfixes that could override a register spec. as well as supply additional constant bits. If an interrupt occurs between the instruction and the postfix, the postfix can be treated as a NOP at the return point.
>
>
Interrupts also don't apply to prefixes either, if one assumes that the prefix and following instruction are always decoded at the same time (forming a 64-bit instruction), which also makes them faster.
>
Yes, that would work. But I still think postfixes are a little easier to work with. One can assume no register fetches are needed for the postfix, so the last decoder slot does not need to mux registers. If there was a prefix, there could be an extra set of register ports required. Unless one gets into muxing the ports for only instructions that need them.
 
Possible workarounds:
   Transposed fetch/decode;
     Where, the fetched words are transposed in the IF stage or IF->ID.
   Right-aligned fetch / decode.
     Basically the same idea, except that the words are right-aligned.
Ironically, would work both for prefix coding, and in WEX bundling where generally the Lane 1 operation is always the last operation in the bundle.
Similar may also be relevant to superscalar, where sometimes valid combinations of instructions may be encountered but not necessarily in the right order for the lane rules (so ability to switch instruction words around between lanes may gain higher ILP).
Though, my current core doesn't do it this way, and instead MUX'es on the output side of the ID stage.
But, yeah, I put a quick working spec of my current set of RV extensions here:
https://pastebin.com/76FCCYHQ
Ironically, this was enough to get a fairly significant performance boost relative to plain RV64G...
Still not quite as fast as XG2 for Doom, but not too far behind (and, at the moment, is now ~ 30% faster than the "gcc -O3" output; and around 80% faster than BGBCC targeting plain RV64G).
Still 26% worse than XG2 in terms of code density though (and still around 19% behind in terms of speed).
But, some highlights:
Can extend Imm12 encodings to have 33-bit immediate values;
Can glue immediate values onto some other instructions which lack them;
Can re-merge the X and F registers (not yet used);
Adds register-indexed load/store;
Adds a few other misc ops.
Not that I really expect this will sway the RISC-V people...
Who will most likely continue to deny that any issues exist in this area.
So, present stats (Doom, text size and fps at start of E1M1):
   BGBCC, XG2      :  291K, 25 fps
   BGBCC, RV64+JExt:  365K, 21 fps
   GCC, RV64G      :  445K, 16 fps
   BGBCC, RV64G    :  423K, 12 fps
But, TBD what the future holds in the face of BJX2/XG2 vs "RISC-V with the worst of the foot-gun performance issues addressed"...
Well, except that probably no one will adopt this stuff, and so most likely RV64 will remain in its state of lackluster performance (and people still denying that any such issues exist).
As for "why do binaries get smaller when most of the new encodings are bigger?":
Well, if 3 or 4 or so 32-bit instructions are replaced by a single 64-bit encoding, well, 2 instruction words is less than 3 or 4...

Date Sujet#  Auteur
27 Sep 24 * Misc: BGBCC targeting RV64G, initial results...37BGB
27 Sep 24 +* Re: Misc: BGBCC targeting RV64G, initial results...20Robert Finch
27 Sep 24 i`* Re: Misc: BGBCC targeting RV64G, initial results...19BGB
27 Sep 24 i `* Re: Misc: BGBCC targeting RV64G, initial results...18MitchAlsup1
28 Sep 24 i  `* Re: Misc: BGBCC targeting RV64G, initial results...17BGB
28 Sep 24 i   `* Re: Misc: BGBCC targeting RV64G, initial results...16MitchAlsup1
28 Sep 24 i    `* Re: Misc: BGBCC targeting RV64G, initial results...15BGB
29 Sep 24 i     `* Re: Misc: BGBCC targeting RV64G, initial results...14MitchAlsup1
30 Sep 24 i      `* Re: Misc: BGBCC targeting RV64G, initial results...13BGB
30 Sep 24 i       +- Re: Misc: BGBCC targeting RV64G, initial results...1MitchAlsup1
1 Oct 24 i       `* Re: Misc: BGBCC targeting RV64G, initial results...11Robert Finch
1 Oct 24 i        +- Re: Misc: BGBCC targeting RV64G, initial results...1MitchAlsup1
3 Oct 24 i        `* Re: Misc: BGBCC targeting RV64G, initial results...9BGB
4 Oct 24 i         +* Re: Misc: BGBCC targeting RV64G, initial results...2Robert Finch
4 Oct 24 i         i`- Re: Misc: BGBCC targeting RV64G, initial results...1BGB
6 Oct 24 i         `* Re: Misc: BGBCC targeting RV64G, initial results...6MitchAlsup1
8 Oct 24 i          `* Re: Misc: BGBCC targeting RV64G, initial results...5BGB
8 Oct 24 i           `* Re: Misc: BGBCC targeting RV64G, initial results...4MitchAlsup1
9 Oct 24 i            `* Re: Misc: BGBCC targeting RV64G, initial results...3BGB
9 Oct 24 i             +- Re: Misc: BGBCC targeting RV64G, initial results...1Stefan Monnier
9 Oct 24 i             `- Re: Misc: BGBCC targeting RV64G, initial results...1MitchAlsup1
27 Sep 24 `* Re: Misc: BGBCC targeting RV64G, initial results...16MitchAlsup1
27 Sep 24  +* Re: Misc: BGBCC targeting RV64G, initial results...2BGB
28 Sep 24  i`- Re: Misc: BGBCC targeting RV64G, initial results...1MitchAlsup1
28 Sep 24  `* Re: Misc: BGBCC targeting RV64G, initial results...13Paul A. Clayton
30 Sep 24   `* Re: Misc: BGBCC targeting RV64G, initial results...12MitchAlsup1
16 Oct 24    `* Re: Misc: BGBCC targeting RV64G, initial results...11Paul A. Clayton
16 Oct 24     +* Re: Misc: BGBCC targeting RV64G, initial results...9Stephen Fuld
16 Oct 24     i+- Re: Misc: BGBCC targeting RV64G, initial results...1Thomas Koenig
16 Oct 24     i`* Re: Misc: BGBCC targeting RV64G, initial results...7BGB
17 Oct 24     i `* Re: Misc: BGBCC targeting RV64G, initial results...6MitchAlsup1
17 Oct 24     i  `* Re: Misc: BGBCC targeting RV64G, initial results...5BGB
18 Oct 24     i   `* Re: Misc: BGBCC targeting RV64G, initial results...4MitchAlsup1
21 Oct 24     i    `* Re: Misc: BGBCC targeting RV64G, initial results...3BGB
21 Oct 24     i     `* Re: Misc: BGBCC targeting RV64G, initial results...2MitchAlsup1
22 Oct 24     i      `- Re: Misc: BGBCC targeting RV64G, initial results...1BGB
16 Oct 24     `- Re: Misc: BGBCC targeting RV64G, initial results...1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal