Re: Misc: BGBCC targeting RV64G, initial results...

Liste des GroupesRevenir à c arch 
Sujet : Re: Misc: BGBCC targeting RV64G, initial results...
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 30. Sep 2024, 03:19:41
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vdd1s0$22tpk$1@dont-email.me>
References : 1 2 3 4 5 6 7 8
User-Agent : Mozilla Thunderbird
On 9/29/2024 2:11 PM, MitchAlsup1 wrote:
On Sat, 28 Sep 2024 4:30:12 +0000, BGB wrote:
 
On 9/27/2024 7:43 PM, MitchAlsup1 wrote:
On Fri, 27 Sep 2024 23:53:22 +0000, BGB wrote:
>
One of the reasons reservation stations became in vouge.
>
>
Possibly, but is a CPU feature rather than a compiler feature...
 A good compiler should be able to make use of 98% of the instruction
set.
Yes, but a reservation station is not part of the ISA proper...

>
------------
>
Saw a video not too long ago where he was making code faster by undoing
a lot of loop unrolling, as the code was apparently spending more in I$
misses than it was gaining by being unrolled.
 I noticed this in 1991 when we got Mc88120 simulator up and running.
GBOoO chips are <nearly> best served when there is the smallest number
of instructions.
Looking it up, seems the CPU in question (MIPS R4300) was:
   16K L1 I$ cache;
   8K L1 D$ cache;
   No L2 cache (but could be supported off-die);
   1-wide scalar, 32 or 64 bit
   Non pipelined FPU and multiplier;
   ...
Oddly, some amount of these older CPUs seem to have larger I$ than D$, whereas IME the D$ seems to have a higher miss rate (so is easier to justify it being bigger).

------------
>
In contrast, a jumbo prefix by itself does not make sense; its meaning
depends on the thing that being is prefixed. Also the decoder will
decode a jumbo prefix and suffix instruction at the same time.
 How many bits does one of these jumbo prefixes consume ?
The prefix itself is 32 bits.
   In the context of XG3, it supplies 23 or 27 bits.
For RISC-V ops, they could supply 21 or 26 bits.
    23+10 = 33 (XG3)
    21+12 = 33 (RV op)
27+27+10 = 64 (XG3)
26+26+12 = 64 (RV op)
J27 could synthesize an immediate for non-immediate ops:
   27+6 = 33 (XG3)
   27+5 = 32 (RV)
For BJX2, the prefixes supply 24 bits (can be stretched to 27 bits in XG2).
   24+ 9/10=33 (Base)
   24+24+16=64 (Base)
   27+27+10=64 (XG2)
But, yeah, perhaps unsurprisingly, the RISC-V people are not so optimistic about the idea of jumbo prefixes...
Also apparently it seems "yeah, here is a prefix whose primary purpose is just to make the immediate field bigger for the following instruction" is not such an obvious or intuitive idea as I had thought.
Well, and people obsessing on what happens if an interrupt somehow occurs "between" the prefix and prefixed instruction.
Which, as I have tended to implement them, is simply not possible, since everything is fetched and decoded at the same time.
Granted, yes, it does add the drawback of needing to have tag-bits to remember the mode, and maybe the CPU hiding mode bits in the high order bits of the link register and similar is not such an elegant idea.
But, as I see it, still preferable to:
Hey, why not just define a bunch of 48-bit encodings for ALU operations with 32-bit immediate fields?...
But, like, blarg, this is what I did originally.
And, I dropped all this in favor of jumbo prefixes, because jumbo prefixes did this job better.
Might still experiment with an "Extended RISC-V" and see if in-fact, adding things like jumbo prefixes will make as much of a difference as I expect.
Well, probably along with indexed load/store and Zba instructions and similar.
I guess, an open question would be if a modified RISC-V variant could be made more performance-competitive with BJX2 without making too much of a mess of things.
I could maybe do so, but probably no one would be interested.
Though, looking online, seems I am really the only one calling them "jumbo prefixes". Not sure if there is another more common term used for these things.

-----
>
>
For the jumbo prefix:
   Recognize that is a jumbo prefix;
   Inform the decoder for the following instruction of this fact
     (via internal flag bits);
   Provide the prefix's data bits to the corresponding decoder.
>
Unlike a "real" instruction, a jumbo prefix does not need to provide
behavior of its own, merely be able to be identified as such and to
provide payload data bits.
>
>
For now, there are not any encodings larger than 96 bits.
Partly this is because 128 bit fetch would likely add more cost and
complexity than it is worth at the moment.
 For your implementation, yes. For all others:: <at best> maybe.
Maybe.
I could maybe consider widening fetch/decode to 128-bits if there were a compelling use case.

>
>
>
>
Likewise, no one seems to be bothering with 64-bit ELF FDPIC for RV64
(there does seem to be some interest for ELF FDPIC but limited to
32-bit
RISC-V ...). Ironically, ideas for doing FDPIC in RV aren't too far off
from PBO (namely, using GP for a global section and then chaining the
sections for each binary).
>
How are you going to do dense PIC switch() {...} in RISC-V ??
>
Already implemented...
With pseudo-instructions:
    SUB Rs, $(MIN), R10
    MOV $(MAX-MIN), R11
    BGTU R11, R10, Lbl_Dfl
>
    MOV   .L0, R6      //AUIPC+ADD
    SHAD  R10, 2, R10  //SLLI
    ADD   R6, R10, R6
    JMP   R6           //JALR X0, X6, 0
>
    .L0:
    BRA  Lbl_Case0     //JAL X0, Lbl_Case0
    BRA  Lbl_Case1
    ...
>
Compared to::
//      ADD        Rt,Rswitch,#-min
        JTT        Rt,#max
        .jttable   min, ... , max, default
adder:
>
The ADD is not necessary if min == 0
>
The JTT instruction compared Rt with 0 on the low side and max
on the high side. If Ri is out of bounds, default is selected.
>
The table displacements come in {B,H,W,D} selected in the JTT
(jump through table) instruction. Rt indexes the table, its
signed value is <<2 and added to address which happens to be
address of JTT instruction + #(max+1)<<entry. {{The table is
fetched through the ICache with execute permission}}
>
Thus, the table is PIC; and generally 1/4 the size of typical
switch tables.
-----
>
Potentially it could be more compact.
 Both more compact and just as fast; many times faster.
Something like this might be worth considering.
Would likely be a pretty useful instruction for something like a bytecode interpreter, which would be more sensitive to the performance of things like "switch()", ...

Date Sujet#  Auteur
27 Sep 24 * Misc: BGBCC targeting RV64G, initial results...37BGB
27 Sep 24 +* Re: Misc: BGBCC targeting RV64G, initial results...20Robert Finch
27 Sep 24 i`* Re: Misc: BGBCC targeting RV64G, initial results...19BGB
27 Sep 24 i `* Re: Misc: BGBCC targeting RV64G, initial results...18MitchAlsup1
28 Sep 24 i  `* Re: Misc: BGBCC targeting RV64G, initial results...17BGB
28 Sep 24 i   `* Re: Misc: BGBCC targeting RV64G, initial results...16MitchAlsup1
28 Sep 24 i    `* Re: Misc: BGBCC targeting RV64G, initial results...15BGB
29 Sep 24 i     `* Re: Misc: BGBCC targeting RV64G, initial results...14MitchAlsup1
30 Sep 24 i      `* Re: Misc: BGBCC targeting RV64G, initial results...13BGB
30 Sep 24 i       +- Re: Misc: BGBCC targeting RV64G, initial results...1MitchAlsup1
1 Oct 24 i       `* Re: Misc: BGBCC targeting RV64G, initial results...11Robert Finch
1 Oct 24 i        +- Re: Misc: BGBCC targeting RV64G, initial results...1MitchAlsup1
3 Oct 24 i        `* Re: Misc: BGBCC targeting RV64G, initial results...9BGB
4 Oct 24 i         +* Re: Misc: BGBCC targeting RV64G, initial results...2Robert Finch
4 Oct 24 i         i`- Re: Misc: BGBCC targeting RV64G, initial results...1BGB
6 Oct 24 i         `* Re: Misc: BGBCC targeting RV64G, initial results...6MitchAlsup1
8 Oct 24 i          `* Re: Misc: BGBCC targeting RV64G, initial results...5BGB
8 Oct 24 i           `* Re: Misc: BGBCC targeting RV64G, initial results...4MitchAlsup1
9 Oct 24 i            `* Re: Misc: BGBCC targeting RV64G, initial results...3BGB
9 Oct 24 i             +- Re: Misc: BGBCC targeting RV64G, initial results...1Stefan Monnier
9 Oct 24 i             `- Re: Misc: BGBCC targeting RV64G, initial results...1MitchAlsup1
27 Sep 24 `* Re: Misc: BGBCC targeting RV64G, initial results...16MitchAlsup1
27 Sep 24  +* Re: Misc: BGBCC targeting RV64G, initial results...2BGB
28 Sep 24  i`- Re: Misc: BGBCC targeting RV64G, initial results...1MitchAlsup1
28 Sep 24  `* Re: Misc: BGBCC targeting RV64G, initial results...13Paul A. Clayton
30 Sep 24   `* Re: Misc: BGBCC targeting RV64G, initial results...12MitchAlsup1
16 Oct 24    `* Re: Misc: BGBCC targeting RV64G, initial results...11Paul A. Clayton
16 Oct 24     +* Re: Misc: BGBCC targeting RV64G, initial results...9Stephen Fuld
16 Oct 24     i+- Re: Misc: BGBCC targeting RV64G, initial results...1Thomas Koenig
16 Oct 24     i`* Re: Misc: BGBCC targeting RV64G, initial results...7BGB
17 Oct 24     i `* Re: Misc: BGBCC targeting RV64G, initial results...6MitchAlsup1
17 Oct 24     i  `* Re: Misc: BGBCC targeting RV64G, initial results...5BGB
18 Oct 24     i   `* Re: Misc: BGBCC targeting RV64G, initial results...4MitchAlsup1
21 Oct 24     i    `* Re: Misc: BGBCC targeting RV64G, initial results...3BGB
21 Oct 24     i     `* Re: Misc: BGBCC targeting RV64G, initial results...2MitchAlsup1
22 Oct 24     i      `- Re: Misc: BGBCC targeting RV64G, initial results...1BGB
16 Oct 24     `- Re: Misc: BGBCC targeting RV64G, initial results...1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal