Newsportal USENET - Re: Tonights Tradeoff - Background Execution Buffers

Re: Tonights Tradeoff - Background Execution Buffers

Sujet : Re: Tonights Tradeoff - Background Execution Buffers
De : robfi680 (at) *nospam* gmail.com (Robert Finch)
Groupes : comp.arch
Date : 04. Oct 2024, 16:54:40

Autres entêtes

Organisation : A noiseless patient Spider
Message-ID : <vdp343$9d38$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : Mozilla Thunderbird

On 2024-10-04 2:19 a.m., Anton Ertl wrote:

Robert Finch <robfi680@gmail.com> writes:
Today I am wondering how many predicate registers are enough. Scanning
webpages reveals a variety. The Itanium has 64-predicates, but they are
used for modulo loops and rotated. Rotating register is Itaniums method
of register renaming, so it needs more visible registers. In a classic
superscalar design with a RAT where registers are renamed, it seems like
64 would be far too many.
Would it? Zen5 has 192 flags registers
<https://i0.wp.com/chipsandcheese.com/wp-content/uploads/2024/09/hc2024_zen5_spec_uplift.png?ssl=1>,
and I assume that means it has 192 C, 192 V, and 192 NZP registers
(physical), for one architectural flags register.

I cannot see the compiler making use of very many predicate registers
simultaneously.
Maybe not, but what are the alternatives:
1) Have one flags register, like AMD64 and ARM A32, T32, and A64, or
the carry flag of Power and 88K, and the flags result of most Power
instructions. Then the compilers typically only know that other
instructions will overwrite that register, and is forced to consume
the flag right away. This leads to bad code generation, as shown in
<2021Mar15.104123@mips.complang.tuwien.ac.at>:
|E.g., in
|<2016May24.093059@mips.complang.tuwien.ac.at> we see that gcc-5.3.0
|compiles
|
|   cf = _addcarry_u64(cf, src1[1], src2[1], &dst[1]);
|   cf = _addcarry_u64(cf, src1[2], src2[2], &dst[2]);
|
|into
|
| d: 48 8b 42 08 mov 0x8(%rdx),%rax
|11: 41 80 c1 ff add $0xff,%r9b
|15: 49 13 40 08 adc 0x8(%r8),%rax
|19: 41 0f 92 c1 setb   %r9b
|1d: 48 89 41 08 mov %rax,0x8(%rcx)
|21: 48 8b 42 10 mov 0x10(%rdx),%rax
|25: 41 80 c1 ff add $0xff,%r9b
|29: 49 13 40 10 adc 0x10(%r8),%rax
|2d: 41 0f 92 c1 setb   %r9b
|31: 48 89 41 10 mov %rax,0x10(%rcx)
|
|Here gcc reifies the carry bit in a GPR (r9b) with the instructions at
|19 and 2d, and also converts it from a GPR into a carry flag in 11 and
|25. This shows that the compiler does not trust itself to preserve
|the carry flag from one adc to the next.
2) Have multiple flags registers, like IA-64. The compiler will
certainly be able to deal with that, but extra instructions are needed
for generating the flags.
3) Use the GPRs for flags. This also often requires additional
instructions for generating the flags, as in MIPS, 88K, or RISC-V
(with quite a bit of differentce between the MIPS/Alpha/RISC-V
approach and the 88K approach). This disadvantage is often mitigated
by having compare-and-branch instructions or instructions that branch
on certain properties of a register's content.
4) Keep the flags results along with GPRs: have carry and overflow as
bit 64 and 65, N is bit 63, and Z tells something about bits 0-63.
The advantage is that you do not have to track the flags separately
(and, in case of AMD64, track each of C, O, and NZP separately), but
instead can use the RAT that is already there for the GPRs. You can
find a preliminary paper on that on
<https://www.complang.tuwien.ac.at/anton/tmp/carry.pdf>.

Since they are not used simultaneously, and register
renaming is in effect, there should not be a great need for predicate
registers.
You need to preserve one instance for every recovery point, i.e.,
every instruction that branches or can trap, and that have not yet
been committed. You also need to preserve one instance if there is
any consumer that has not yet proceeded through execution. The
simplest way to satisfy both requirements is to just preserve any
flags result until the generating instruction retires. And if most
instructions generate flags, that means a lot of instances of the
flags. There is a reason why Zen5 has 192.
- anton

I was thinking more along the line of architectural predicate registers, and reserving bits in the instruction for them. The 192 flags of Zen5 are physical registers. Q+ has the predicate registers as a subset of the GPRs. There are 512 physical registers, so potentially loads of registers for renaming predicates. Alternative #3 is in use, GPRs are being used for general flag usage.
Q+ has a three input add instruction to help support multi-precision arithmetic. The idea was the carry input could be calculated and fed in the third register. The carry value would be generated by an add instruction (addgc) that just produces the carry bit, given the same argument registers as the add the carry is needed for. But that is ugly and takes an extra instruction.
One solution, not mentioned in your article, is to support arithmetic with two bits less than the number of bit a register can support, so that the carry and overflow can be stored. On a 64-bit machine have all operations use only 62-bits. It would solve the issue of how to load or store the carry and overflow bits associated with a register. Sometimes arithmetic is performed with fewer bits, as for pointer representation. I wonder if pointer masking could somehow be involved. It may be useful to have a bit indicating the presence of a pointer. Also thinking of how to track a binary point position for fixed point arithmetic. Perhaps using the whole upper byte of a register for status/control bits would work.
It may be possible with Q+ to support a second destination register which is in a subset of the GPRs. For example, one of eight registers could be specified to holds the carry/overflow status. That effectively ties up a second ALU though as an extra write port is needed for the instruction.

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
7 Sep 24	Tonights Tradeoff	108	Robert Finch
7 Sep 24	Re: Tonights Tradeoff	107	MitchAlsup1
8 Sep 24	Re: Tonights Tradeoff	106	Robert Finch
8 Sep 24	Re: Tonights Tradeoff	105	MitchAlsup1
10 Sep 24	Re: Tonights Tradeoff	104	Robert Finch
10 Sep 24	Re: Tonights Tradeoff	17	BGB
10 Sep 24	Re: Tonights Tradeoff	12	Robert Finch
10 Sep 24	Re: Tonights Tradeoff	10	BGB
11 Sep 24	Re: Tonights Tradeoff	9	Robert Finch
11 Sep 24	Re: Tonights Tradeoff	7	Stephen Fuld
11 Sep 24	Re: Tonights Tradeoff	1	MitchAlsup1
12 Sep 24	Re: Tonights Tradeoff	5	Robert Finch
12 Sep 24	Re: Tonights Tradeoff	4	MitchAlsup1
12 Sep 24	Re: Tonights Tradeoff	3	Robert Finch
12 Sep 24	Re: Tonights Tradeoff	2	MitchAlsup1
13 Sep 24	Re: Tonights Tradeoff	1	MitchAlsup1
12 Sep 24	Re: Tonights Tradeoff	1	BGB
11 Sep 24	Re: Tonights Tradeoff	1	MitchAlsup1
11 Sep 24	Re: Tonights Tradeoff	4	MitchAlsup1
12 Sep 24	Re: Tonights Tradeoff	3	Thomas Koenig
12 Sep 24	Re: Tonights Tradeoff	2	BGB
12 Sep 24	Re: Tonights Tradeoff	1	Robert Finch
11 Sep 24	Re: Tonights Tradeoff	86	MitchAlsup1
15 Sep 24	Re: Tonights Tradeoff	85	Robert Finch
16 Sep 24	Re: Tonights Tradeoff	84	Robert Finch
24 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	83	Robert Finch
24 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	82	MitchAlsup1
26 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	81	Robert Finch
26 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	80	MitchAlsup1
27 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	79	Robert Finch
4 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	78	Robert Finch
4 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	75	Anton Ertl
4 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	74	Robert Finch
5 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	73	Anton Ertl
9 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	72	Robert Finch
9 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	3	MitchAlsup1
9 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	Robert Finch
12 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	BGB
12 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	67	Robert Finch
12 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	66	MitchAlsup1
12 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	65	BGB
12 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	64	Robert Finch
13 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	3	MitchAlsup1
13 Oct 24	Re: Tonights Tradeoff - ATOM	2	Robert Finch
13 Oct 24	Re: Tonights Tradeoff - ATOM	1	MitchAlsup1
13 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	1	BGB
31 Oct 24	Page fetching cache controller	59	Robert Finch
31 Oct 24	Re: Page fetching cache controller	1	MitchAlsup1
6 Nov 24	Re: Q+ Fibonacci	57	Robert Finch
17 Apr 25	Re: register sets	56	Robert Finch
17 Apr 25	Re: register sets	53	Stephen Fuld
17 Apr 25	Re: register sets	1	Robert Finch
17 Apr 25	Re: register sets	46	MitchAlsup1
18 Apr 25	Re: register sets	45	Robert Finch
18 Apr 25	Re: register sets	44	MitchAlsup1
20 Apr 25	Re: register sets	43	Robert Finch
21 Apr 25	Re: auto predicating branches	42	Robert Finch
21 Apr 25	Re: auto predicating branches	41	Anton Ertl
21 Apr 25	Is an instruction on the critical path? (was: auto predicating branches)	1	Anton Ertl
21 Apr 25	Re: auto predicating branches	39	MitchAlsup1
22 Apr 25	Re: auto predicating branches	38	Anton Ertl
22 Apr 25	Re: auto predicating branches	1	MitchAlsup1
22 Apr 25	Re: auto predicating branches	36	Anton Ertl
22 Apr 25	Re: auto predicating branches	35	MitchAlsup1
23 Apr 25	Re: auto predicating branches	3	Stefan Monnier
23 Apr 25	Re: auto predicating branches	2	Anton Ertl
25 Apr 25	Re: auto predicating branches	1	MitchAlsup1
23 Apr 25	Re: auto predicating branches	31	Anton Ertl
23 Apr 25	Re: auto predicating branches	30	MitchAlsup1
24 Apr 25	Re: asynch register rename	29	Robert Finch
27 Apr 25	Re: fractional PCs	28	Robert Finch
27 Apr 25	Re: fractional PCs	27	MitchAlsup1
28 Apr 25	Re: fractional PCs	26	Robert Finch
28 Apr 25	Re: fractional PCs	15	MitchAlsup1
29 Apr 25	Re: fractional PCs	14	Robert Finch
5 May 25	Re: control co-processor	13	Robert Finch
5 May 25	Re: control co-processor	12	Al Kossow
5 May 25	Re: control co-processor	11	Stefan Monnier
6 May 25	Re: control co-processor	3	MitchAlsup1
7 May 25	Re: control co-processor	1	MitchAlsup1
15 Jul 25	Re: control co-processor	1	MitchAlsup1
7 May 25	Scan chains (was: control co-processor)	7	Stefan Monnier
7 May 25	Re: Scan chains (was: control co-processor)	2	Al Kossow
7 May 25	Re: Scan chains	1	Stefan Monnier
7 May 25	Re: Scan chains	3	MitchAlsup1
7 May 25	Re: Scan chains	2	Stefan Monnier
8 May 25	Re: Scan chains	1	MitchAlsup1
15 Jul 25	Re: Scan chains	1	MitchAlsup1
29 Apr 25	Re: fractional PCs	10	Robert Finch
29 Apr 25	Re: fractional PCs	9	MitchAlsup1
30 Apr 25	Re: fractional PCs	8	Robert Finch
30 Apr 25	Re: fractional PCs	6	Thomas Koenig
1 May 25	Re: fractional PCs	1	Robert Finch
2 May 25	Re: fractional PCs	4	moi
2 May 25	Re: millicode, extracode, fractional PCs	2	John Levine
2 May 25	Re: millicode, extracode, fractional PCs	1	moi
2 May 25	Re: fractional PCs	1	moi
30 Apr 25	Re: fractional PCs	1	MitchAlsup1
15 Jul 25	Re: register sets	5	John Savard
15 Jul 25	Re: register sets	4	MitchAlsup1
19 Jul 25	Re: register sets	3	Robert Finch
15 Jul 25	Re: register sets	2	John Savard
13 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	Anton Ertl
4 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	BGB
6 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	MitchAlsup1