Newsportal USENET - Re: Tonights Tradeoff - Background Execution Buffers

Re: Tonights Tradeoff - Background Execution Buffers

Sujet : Re: Tonights Tradeoff - Background Execution Buffers
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 12. Oct 2024, 20:10:01

Autres entêtes

Organisation : A noiseless patient Spider
Message-ID : <veehid$9gnd$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
User-Agent : Mozilla Thunderbird

On 10/9/2024 11:19 AM, MitchAlsup1 wrote:

On Wed, 9 Oct 2024 10:44:08 +0000, Robert Finch wrote:

>
Been thinking some about the carry and overflow and what to do about
register spills and reloads during expression processing. My thought was
that on the machine with 256 registers, simply allocate a ridiculous
number of registers for expression processing, for example 25 or even
50. Then if the expression is too complex, have the compiler spit out an
error message to the programmer to simplify the expression. Remnants of
the ‘expression too complex’ error in BASIC.
Both completely unacceptable, and in your case completely unnecessary.
in 967 subroutines I read out of My 66000 LLVM compile, I only have
3 cases of spill-fill, and that is with only 32 registers with uni-
versal constants.

Tends to be a bit higher IME, but granted my compiler is a bit more naive:
   Either it can static-assign everything;
   Or, it needs to use spill-and-fill.
In RISC-V mode:
   Static-assign everything, Leaf: 13%
   Partial assign, Leaf: 7.1%
   Static-assign everything, Non-Leaf: 1.8%
   Partial assign, Non-Leaf: 85%
   Average, ~ 4.6 variables static-assigned
   Out of 16.6 variables in a function.
In XG2 mode:
   Static-assign everything, Leaf: 16%
   Partial assign, Leaf: 0.7%
   Static-assign everything, Non-Leaf: 1.9%
   Partial assign, Non-Leaf: 82%
   Average, ~ 4.8 variables static-assigned
   Out of 16.8 variables in a function.
Theoretically, the number of static-assigned variables and fully static-assigned functions could be higher, but it looks like the compiler is excluding a lot of them for some reason (may need to look into it).

Of the RISC-V code I read alongside with 32+32 registers, I counted 8.

With 64 GPRs, there can be less spill/fill, and without any increase in the number of hardware registers vs RV64G's 32+32 scheme.
Rarely is register pressure equally balanced in this way, and more often it is one of:
High integer register pressure, little or no FP pressure (most code);
Very high FP register pressure, low integer pressure (say, unrolled matrix multiply).
Where, an even-split X/F scheme serves neither, and a bigger unified register space serves both.
Though, I guess the usual argument for split GPR/FPR spaces is that with unified register spaces, both ALU and FPU need to use the same pipeline.
But, if it is a shared register pipeline, one can also leverage ALU for a lot of edge cases, like FPU compare.
If one uses a longer pipeline for FPU ops vs ALU, it seems like one will still need to pay the costs of the longer FPU pipeline regardless of whether they are a single or separate register file.
Apparently, similar reasoning for the V extension using separate vector registers (vs just aliasing with the F registers), but I don't really want to implement the V extension.
Almost more tempting to do a cut-down non-conforming "V in F" style implementation:
* Aliases V to F register pairs;
** TBD if better to use V0..V15 or even-only numbering.
** Or, V0..V31 exist (if aliased) for 64b vectors,
** but only even for 128b.
* Will drop mask bits and other more advanced features.
* Trying to set up V properly would result in the instructions faulting.
** Could allow the possibility of adding proper V later.

With those statistics and 256 registers, If you can't get to essentially
0 spill=fill the problem is not with your architecture but with your
compiler.

With 256 registers, probably 99% of functions could use a "statically assign every variable to a register" strategy (though, assuming a case where one can reuse registers for temporary values).
Where, most temporary values are created and used within a single basic block, and if no references to that specific temporary exist outside of the basic block (and if not marked with a phi operator), the value of the temporary can simply be assumed to disappear at the end of a basic block. This can also allow temporaries to be allocated into scratch registers.
My own thought though is that going much bigger in terms of the main register file likely isn't worth it.
Only real compelling use for a bigger register file (much over 64) at the moment would be more for optimizing interrupts and context switches.

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
7 Sep 24	Tonights Tradeoff	99	Robert Finch
7 Sep 24	Re: Tonights Tradeoff	98	MitchAlsup1
8 Sep 24	Re: Tonights Tradeoff	97	Robert Finch
8 Sep 24	Re: Tonights Tradeoff	96	MitchAlsup1
10 Sep 24	Re: Tonights Tradeoff	95	Robert Finch
10 Sep 24	Re: Tonights Tradeoff	17	BGB
10 Sep 24	Re: Tonights Tradeoff	12	Robert Finch
10 Sep 24	Re: Tonights Tradeoff	10	BGB
11 Sep 24	Re: Tonights Tradeoff	9	Robert Finch
11 Sep 24	Re: Tonights Tradeoff	7	Stephen Fuld
11 Sep 24	Re: Tonights Tradeoff	1	MitchAlsup1
12 Sep 24	Re: Tonights Tradeoff	5	Robert Finch
12 Sep 24	Re: Tonights Tradeoff	4	MitchAlsup1
12 Sep 24	Re: Tonights Tradeoff	3	Robert Finch
12 Sep 24	Re: Tonights Tradeoff	2	MitchAlsup1
13 Sep 24	Re: Tonights Tradeoff	1	MitchAlsup1
12 Sep 24	Re: Tonights Tradeoff	1	BGB
11 Sep 24	Re: Tonights Tradeoff	1	MitchAlsup1
11 Sep 24	Re: Tonights Tradeoff	4	MitchAlsup1
12 Sep 24	Re: Tonights Tradeoff	3	Thomas Koenig
12 Sep 24	Re: Tonights Tradeoff	2	BGB
12 Sep 24	Re: Tonights Tradeoff	1	Robert Finch
11 Sep 24	Re: Tonights Tradeoff	77	MitchAlsup1
15 Sep 24	Re: Tonights Tradeoff	76	Robert Finch
16 Sep 24	Re: Tonights Tradeoff	75	Robert Finch
24 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	74	Robert Finch
24 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	73	MitchAlsup1
26 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	72	Robert Finch
26 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	71	MitchAlsup1
27 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	70	Robert Finch
4 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	69	Robert Finch
4 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	66	Anton Ertl
4 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	65	Robert Finch
5 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	64	Anton Ertl
9 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	63	Robert Finch
9 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	3	MitchAlsup1
9 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	Robert Finch
12 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	BGB
12 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	58	Robert Finch
12 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	57	MitchAlsup1
12 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	56	BGB
12 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	55	Robert Finch
13 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	3	MitchAlsup1
13 Oct 24	Re: Tonights Tradeoff - ATOM	2	Robert Finch
13 Oct 24	Re: Tonights Tradeoff - ATOM	1	MitchAlsup1
13 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	1	BGB
31 Oct 24	Page fetching cache controller	50	Robert Finch
31 Oct 24	Re: Page fetching cache controller	1	MitchAlsup1
6 Nov 24	Re: Q+ Fibonacci	48	Robert Finch
17 Apr 25	Re: register sets	47	Robert Finch
17 Apr 25	Re: register sets	46	Stephen Fuld
17 Apr 25	Re: register sets	1	Robert Finch
17 Apr 25	Re: register sets	44	MitchAlsup1
18 Apr 25	Re: register sets	43	Robert Finch
18 Apr 25	Re: register sets	42	MitchAlsup1
20 Apr 25	Re: register sets	41	Robert Finch
21 Apr 25	Re: auto predicating branches	40	Robert Finch
21 Apr 25	Re: auto predicating branches	39	Anton Ertl
21 Apr 25	Is an instruction on the critical path? (was: auto predicating branches)	1	Anton Ertl
21 Apr 25	Re: auto predicating branches	37	MitchAlsup1
22 Apr 25	Re: auto predicating branches	36	Anton Ertl
22 Apr 25	Re: auto predicating branches	1	MitchAlsup1
22 Apr 25	Re: auto predicating branches	34	Anton Ertl
22 Apr 25	Re: auto predicating branches	33	MitchAlsup1
23 Apr 25	Re: auto predicating branches	3	Stefan Monnier
23 Apr 25	Re: auto predicating branches	2	Anton Ertl
25 Apr 25	Re: auto predicating branches	1	MitchAlsup1
23 Apr 25	Re: auto predicating branches	29	Anton Ertl
23 Apr 25	Re: auto predicating branches	28	MitchAlsup1
24 Apr 25	Re: asynch register rename	27	Robert Finch
27 Apr 25	Re: fractional PCs	26	Robert Finch
27 Apr 25	Re: fractional PCs	25	MitchAlsup1
28 Apr 25	Re: fractional PCs	24	Robert Finch
28 Apr 25	Re: fractional PCs	13	MitchAlsup1
29 Apr 25	Re: fractional PCs	12	Robert Finch
5 May 25	Re: control co-processor	11	Robert Finch
5 May 25	Re: control co-processor	10	Al Kossow
5 May 25	Re: control co-processor	9	Stefan Monnier
6 May 25	Re: control co-processor	2	MitchAlsup1
7 May 25	Re: control co-processor	1	MitchAlsup1
7 May 25	Scan chains (was: control co-processor)	6	Stefan Monnier
7 May 25	Re: Scan chains (was: control co-processor)	2	Al Kossow
7 May 25	Re: Scan chains	1	Stefan Monnier
7 May 25	Re: Scan chains	3	MitchAlsup1
7 May 25	Re: Scan chains	2	Stefan Monnier
8 May 25	Re: Scan chains	1	MitchAlsup1
29 Apr 25	Re: fractional PCs	10	Robert Finch
29 Apr 25	Re: fractional PCs	9	MitchAlsup1
30 Apr 25	Re: fractional PCs	8	Robert Finch
30 Apr 25	Re: fractional PCs	6	Thomas Koenig
1 May 25	Re: fractional PCs	1	Robert Finch
2 May 25	Re: fractional PCs	4	moi
2 May 25	Re: millicode, extracode, fractional PCs	2	John Levine
2 May 25	Re: millicode, extracode, fractional PCs	1	moi
2 May 25	Re: fractional PCs	1	moi
30 Apr 25	Re: fractional PCs	1	MitchAlsup1
13 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	Anton Ertl
4 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	BGB
6 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	MitchAlsup1