Re: Tonights Tradeoff

Liste des GroupesRevenir à c arch 
Sujet : Re: Tonights Tradeoff
De : robfi680 (at) *nospam* gmail.com (Robert Finch)
Groupes : comp.arch
Date : 10. Sep 2024, 15:58:30
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vbpmqr$30vto$1@dont-email.me>
References : 1 2 3 4 5 6
User-Agent : Mozilla Thunderbird
On 2024-09-10 3:00 a.m., BGB wrote:
 I haven't really understood how it could be implemented.
But, granted, my pipeline design is relatively simplistic, and my priority had usually been trying to make a "fast but cheap and simple" pipeline, rather than a "clever" pipeline.
 Still not as cheap or simple as I would want.
  
Qupls has RISC-V style vector / SIMD registers. For Q+ every instruction can be a vector instruction, as there are bits indicating which registers are vector registers in the instruction. All the scalar instructions become vector. This cuts down on some of the bloat in the ISA. There is only a handful of vector specific instructions (about eight I think). The drawback is that the ISA is 48-bits wide. However, the code bloat is less than 50% as some instructions have dual- operations. Branches can increment or decrement and loop. Bigfoot uses a postfix word to indicate to use the vector form of the instruction. Bigfoot’s code density is a lot better being variable length, but I suspect it will not run as fast. Bigfoot and Q+ share a lot of the same code. Trying to make the guts of the cores generic.
>
 In my case, the core ended up generic enough that it can support both BJX2 and RISC-V. Could almost make sense to lean more heavily into this (trying to consolidate more things and better optimize costs).
 Did also recently get around to more-or-less implementing support for the 'C' extension, even as much as it is kinda dog-chewed and does not efficiently utilize the encoding space.
  It burns a lot of encoding space on 6 and 8 bit immediate fields (with 11 bit branch displacements), more 5-bit register fields than ideal, ... so, has relatively few unique instructions, but:
Many of the instructions it does have are left with 3 bit register fields;
Has way a bit too many immediate-field layouts as it just sort of shoe- horns immediate fields into whatever bits are left.
 Though, turns out I could skip a few things due to them being N/E in RV64 (RV32, RV64, and RV128 get a slightly different selection of ops in the C extension).
 Like, many things in RV land make "annoying and kinda poor" design choices.
 Then again, if one assumes that the role of 'C' is mostly:
   Does SP-relative loads/stores and MOV-RR.
 Well, it does do this at least...
 Nevermind if you want to use any of the ALU ops (besides ADD), or non- stack-relative Load/Store, well then, enjoy the 3 bit register fields.
 And, still way too many immediate-field encodings for what is effectively load/store and a few ALU ops.
    I am not as much a fan of RISC-V's 'V' extension mostly in that it would require essentially doubling the size of the register file.
The register file in Q+ is huge. One of the drawbacks of supporting vectors. There were 1024 physical registers for support. Reduced it to 512 and that still may be too many. There was a 4kb wide mapping ram, resulting in a warning message. I may have to split up components into multiple copies to get the desired size to work.

 And, if I were to do something like 'V' I would likely do some things differently:
Rather than having an instruction to load vector control state into CSR's, it would make more sense IMO to use bigger 64-bit instructions and encode the vector state directly into these instructions.
 While this would be worse for code density, it would avoid needing to burn instructions setting up vector state, and would have less penalty (in terms of clock-cycles) if working with heterogeneous vectors.
  Say, one possibility could be a combo-SIMD op with a control field:
   2b vector size
     64 / 128 / resv / resv
   2b element size
     8 / 16/ 32/ 64
   2b category
     wrap / modulo
     float
     signed saturate
     unsigned saturate
   6b operator
     add, sub, mul, mac, mulhi, ...
 
Q+ is setup almost that way. It uses 48b instructions. There is a 2b precision field in instructions that determines the lane/sub element size 8/16/32/64. The precision field also applies to scalar registers. The category is wrapped up in the opcode which is seven bits. One can do a float add on a vector register, then a bitwise operation on the same register. The vector registers work the same way as the scalar ones. There is no type state associated with them, unlike RISCV. To control the length (which lanes are active) there is a global mask register instead of a vector length register.
Sign control plus a vector indicator for each register spec results in a seven-bit spec, and there are four registers encoded in an instruction, which uses 28-bit, combined with a seven-bit opcode is 35 bits. There was just no way the instruction set was fitting in 32b. For a while the ISA was 40-bit, but I figured it was better to go 48-bit then add some additional functionality to make up for the wider ISA.

Though, with not every combination necessarily being allowed.
Say, for example, if the implementation limits FP-SIMD to 4 or 8 vector elements.
 Though, it may make sense to be asymmetric as well:
   2-vide vectors can support Binary64
   4-wide can support Binary32
   8-wide can support Binary16 ( + 4x FP16 units)
   16 can support FP8 ( + 8x FP8 units)
 Whereas, say, 16x Binary32 capable units would be infeasible.
 Well, as opposed to defining encodings one-at-a-time in the 32-bit encoding space.
  It could be tempting to possibly consider using pipelining and multi- stage decoding to allow some ops as well. Say, possibly handling 8-wide vectors internally as 2x 4-wide operations, or maybe allowing 256-bit vector ops in the absence of 256-bit vectors in hardware.
 ...
 
Q+ has two ALU’s, which may, at some point, be expanded by two more ALUs with reduced functionality.
It sounds great, but I cannot seem to get Q+ to synthesize correctly. It reports the size as 45kLUTs, but I know the size is about double that, based on previous synthesis. A bunch of the components are showing up as zero sized in the synth report. Figuring out why stuff is being stripped out is a challenge. It runs in simulation at least for a few instructions. If components are being stripped out, why does it work in SIM? Scratches head. It does break the magical IPC of 1.0.

Date Sujet#  Auteur
7 Sep 24 * Tonights Tradeoff99Robert Finch
7 Sep 24 `* Re: Tonights Tradeoff98MitchAlsup1
8 Sep 24  `* Re: Tonights Tradeoff97Robert Finch
8 Sep 24   `* Re: Tonights Tradeoff96MitchAlsup1
10 Sep 24    `* Re: Tonights Tradeoff95Robert Finch
10 Sep 24     +* Re: Tonights Tradeoff17BGB
10 Sep 24     i+* Re: Tonights Tradeoff12Robert Finch
10 Sep 24     ii+* Re: Tonights Tradeoff10BGB
11 Sep 24     iii`* Re: Tonights Tradeoff9Robert Finch
11 Sep 24     iii +* Re: Tonights Tradeoff7Stephen Fuld
11 Sep 24     iii i+- Re: Tonights Tradeoff1MitchAlsup1
12 Sep 24     iii i`* Re: Tonights Tradeoff5Robert Finch
12 Sep 24     iii i `* Re: Tonights Tradeoff4MitchAlsup1
12 Sep 24     iii i  `* Re: Tonights Tradeoff3Robert Finch
12 Sep 24     iii i   `* Re: Tonights Tradeoff2MitchAlsup1
13 Sep 24     iii i    `- Re: Tonights Tradeoff1MitchAlsup1
12 Sep 24     iii `- Re: Tonights Tradeoff1BGB
11 Sep 24     ii`- Re: Tonights Tradeoff1MitchAlsup1
11 Sep 24     i`* Re: Tonights Tradeoff4MitchAlsup1
12 Sep 24     i `* Re: Tonights Tradeoff3Thomas Koenig
12 Sep 24     i  `* Re: Tonights Tradeoff2BGB
12 Sep 24     i   `- Re: Tonights Tradeoff1Robert Finch
11 Sep 24     `* Re: Tonights Tradeoff77MitchAlsup1
15 Sep 24      `* Re: Tonights Tradeoff76Robert Finch
16 Sep 24       `* Re: Tonights Tradeoff75Robert Finch
24 Sep 24        `* Re: Tonights Tradeoff - Background Execution Buffers74Robert Finch
24 Sep 24         `* Re: Tonights Tradeoff - Background Execution Buffers73MitchAlsup1
26 Sep 24          `* Re: Tonights Tradeoff - Background Execution Buffers72Robert Finch
26 Sep 24           `* Re: Tonights Tradeoff - Background Execution Buffers71MitchAlsup1
27 Sep 24            `* Re: Tonights Tradeoff - Background Execution Buffers70Robert Finch
4 Oct 24             `* Re: Tonights Tradeoff - Background Execution Buffers69Robert Finch
4 Oct 24              +* Re: Tonights Tradeoff - Background Execution Buffers66Anton Ertl
4 Oct 24              i`* Re: Tonights Tradeoff - Background Execution Buffers65Robert Finch
5 Oct 24              i `* Re: Tonights Tradeoff - Background Execution Buffers64Anton Ertl
9 Oct 24              i  `* Re: Tonights Tradeoff - Background Execution Buffers63Robert Finch
9 Oct 24              i   +* Re: Tonights Tradeoff - Background Execution Buffers3MitchAlsup1
9 Oct 24              i   i+- Re: Tonights Tradeoff - Background Execution Buffers1Robert Finch
12 Oct 24              i   i`- Re: Tonights Tradeoff - Background Execution Buffers1BGB
12 Oct 24              i   +* Re: Tonights Tradeoff - Carry and Overflow58Robert Finch
12 Oct 24              i   i`* Re: Tonights Tradeoff - Carry and Overflow57MitchAlsup1
12 Oct 24              i   i `* Re: Tonights Tradeoff - Carry and Overflow56BGB
12 Oct 24              i   i  `* Re: Tonights Tradeoff - Carry and Overflow55Robert Finch
13 Oct 24              i   i   +* Re: Tonights Tradeoff - Carry and Overflow3MitchAlsup1
13 Oct 24              i   i   i`* Re: Tonights Tradeoff - ATOM2Robert Finch
13 Oct 24              i   i   i `- Re: Tonights Tradeoff - ATOM1MitchAlsup1
13 Oct 24              i   i   +- Re: Tonights Tradeoff - Carry and Overflow1BGB
31 Oct 24              i   i   `* Page fetching cache controller50Robert Finch
31 Oct 24              i   i    +- Re: Page fetching cache controller1MitchAlsup1
6 Nov 24              i   i    `* Re: Q+ Fibonacci48Robert Finch
17 Apr 25              i   i     `* Re: register sets47Robert Finch
17 Apr 25              i   i      `* Re: register sets46Stephen Fuld
17 Apr 25              i   i       +- Re: register sets1Robert Finch
17 Apr 25              i   i       `* Re: register sets44MitchAlsup1
18 Apr 25              i   i        `* Re: register sets43Robert Finch
18 Apr 25              i   i         `* Re: register sets42MitchAlsup1
20 Apr 25              i   i          `* Re: register sets41Robert Finch
21 Apr 25              i   i           `* Re: auto predicating branches40Robert Finch
21 Apr 25              i   i            `* Re: auto predicating branches39Anton Ertl
21 Apr 25              i   i             +- Is an instruction on the critical path? (was: auto predicating branches)1Anton Ertl
21 Apr 25              i   i             `* Re: auto predicating branches37MitchAlsup1
22 Apr 25              i   i              `* Re: auto predicating branches36Anton Ertl
22 Apr 25              i   i               +- Re: auto predicating branches1MitchAlsup1
22 Apr 25              i   i               `* Re: auto predicating branches34Anton Ertl
22 Apr 25              i   i                `* Re: auto predicating branches33MitchAlsup1
23 Apr 25              i   i                 +* Re: auto predicating branches3Stefan Monnier
23 Apr 25              i   i                 i`* Re: auto predicating branches2Anton Ertl
25 Apr 25              i   i                 i `- Re: auto predicating branches1MitchAlsup1
23 Apr 25              i   i                 `* Re: auto predicating branches29Anton Ertl
23 Apr 25              i   i                  `* Re: auto predicating branches28MitchAlsup1
24 Apr 25              i   i                   `* Re: asynch register rename27Robert Finch
27 Apr 25              i   i                    `* Re: fractional PCs26Robert Finch
27 Apr 25              i   i                     `* Re: fractional PCs25MitchAlsup1
28 Apr 25              i   i                      `* Re: fractional PCs24Robert Finch
28 Apr 25              i   i                       +* Re: fractional PCs13MitchAlsup1
29 Apr 25              i   i                       i`* Re: fractional PCs12Robert Finch
5 May 25              i   i                       i `* Re: control co-processor11Robert Finch
5 May 25              i   i                       i  `* Re: control co-processor10Al Kossow
5 May 25              i   i                       i   `* Re: control co-processor9Stefan Monnier
6 May 25              i   i                       i    +* Re: control co-processor2MitchAlsup1
7 May 25              i   i                       i    i`- Re: control co-processor1MitchAlsup1
7 May 25              i   i                       i    `* Scan chains (was: control co-processor)6Stefan Monnier
7 May 25              i   i                       i     +* Re: Scan chains (was: control co-processor)2Al Kossow
7 May 25              i   i                       i     i`- Re: Scan chains1Stefan Monnier
7 May 25              i   i                       i     `* Re: Scan chains3MitchAlsup1
7 May 25              i   i                       i      `* Re: Scan chains2Stefan Monnier
8 May 25              i   i                       i       `- Re: Scan chains1MitchAlsup1
29 Apr 25              i   i                       `* Re: fractional PCs10Robert Finch
29 Apr 25              i   i                        `* Re: fractional PCs9MitchAlsup1
30 Apr 25              i   i                         `* Re: fractional PCs8Robert Finch
30 Apr 25              i   i                          +* Re: fractional PCs6Thomas Koenig
1 May 25              i   i                          i+- Re: fractional PCs1Robert Finch
2 May 25              i   i                          i`* Re: fractional PCs4moi
2 May 25              i   i                          i +* Re: millicode, extracode, fractional PCs2John Levine
2 May 25              i   i                          i i`- Re: millicode, extracode, fractional PCs1moi
2 May 25              i   i                          i `- Re: fractional PCs1moi
30 Apr 25              i   i                          `- Re: fractional PCs1MitchAlsup1
13 Oct 24              i   `- Re: Tonights Tradeoff - Background Execution Buffers1Anton Ertl
4 Oct 24              +- Re: Tonights Tradeoff - Background Execution Buffers1BGB
6 Oct 24              `- Re: Tonights Tradeoff - Background Execution Buffers1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal