Re: Tonights Tradeoff - Background Execution Buffers

Liste des GroupesRevenir à c arch 
Sujet : Re: Tonights Tradeoff - Background Execution Buffers
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.arch
Date : 04. Oct 2024, 07:19:31
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2024Oct4.081931@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13
User-Agent : xrn 10.11
Robert Finch <robfi680@gmail.com> writes:
Today I am wondering how many predicate registers are enough. Scanning
webpages reveals a variety. The Itanium has 64-predicates, but they are
used for modulo loops and rotated. Rotating register is Itaniums method
of register renaming, so it needs more visible registers. In a classic
superscalar design with a RAT where registers are renamed, it seems like
64 would be far too many.

Would it?  Zen5 has 192 flags registers
<https://i0.wp.com/chipsandcheese.com/wp-content/uploads/2024/09/hc2024_zen5_spec_uplift.png?ssl=1>,
and I assume that means it has 192 C, 192 V, and 192 NZP registers
(physical), for one architectural flags register.

I cannot see the compiler making use of very many predicate registers
simultaneously.

Maybe not, but what are the alternatives:

1) Have one flags register, like AMD64 and ARM A32, T32, and A64, or
the carry flag of Power and 88K, and the flags result of most Power
instructions.  Then the compilers typically only know that other
instructions will overwrite that register, and is forced to consume
the flag right away.  This leads to bad code generation, as shown in
<2021Mar15.104123@mips.complang.tuwien.ac.at>:

|E.g., in
|<2016May24.093059@mips.complang.tuwien.ac.at> we see that gcc-5.3.0
|compiles
|
|   cf = _addcarry_u64(cf, src1[1], src2[1], &dst[1]);
|   cf = _addcarry_u64(cf, src1[2], src2[2], &dst[2]);
|
|into
|
| d: 48 8b 42 08          mov    0x8(%rdx),%rax
|11: 41 80 c1 ff          add    $0xff,%r9b
|15: 49 13 40 08          adc    0x8(%r8),%rax
|19: 41 0f 92 c1          setb   %r9b
|1d: 48 89 41 08          mov    %rax,0x8(%rcx)
|21: 48 8b 42 10          mov    0x10(%rdx),%rax
|25: 41 80 c1 ff          add    $0xff,%r9b
|29: 49 13 40 10          adc    0x10(%r8),%rax
|2d: 41 0f 92 c1          setb   %r9b
|31: 48 89 41 10          mov    %rax,0x10(%rcx)
|
|Here gcc reifies the carry bit in a GPR (r9b) with the instructions at
|19 and 2d, and also converts it from a GPR into a carry flag in 11 and
|25.  This shows that the compiler does not trust itself to preserve
|the carry flag from one adc to the next.

2) Have multiple flags registers, like IA-64.  The compiler will
certainly be able to deal with that, but extra instructions are needed
for generating the flags.

3) Use the GPRs for flags.  This also often requires additional
instructions for generating the flags, as in MIPS, 88K, or RISC-V
(with quite a bit of differentce between the MIPS/Alpha/RISC-V
approach and the 88K approach).  This disadvantage is often mitigated
by having compare-and-branch instructions or instructions that branch
on certain properties of a register's content.

4) Keep the flags results along with GPRs: have carry and overflow as
bit 64 and 65, N is bit 63, and Z tells something about bits 0-63.
The advantage is that you do not have to track the flags separately
(and, in case of AMD64, track each of C, O, and NZP separately), but
instead can use the RAT that is already there for the GPRs.  You can
find a preliminary paper on that on
<https://www.complang.tuwien.ac.at/anton/tmp/carry.pdf>.

Since they are not used simultaneously, and register
renaming is in effect, there should not be a great need for predicate
registers.

You need to preserve one instance for every recovery point, i.e.,
every instruction that branches or can trap, and that have not yet
been committed.  You also need to preserve one instance if there is
any consumer that has not yet proceeded through execution.  The
simplest way to satisfy both requirements is to just preserve any
flags result until the generating instruction retires.  And if most
instructions generate flags, that means a lot of instances of the
flags.  There is a reason why Zen5 has 192.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Date Sujet#  Auteur
7 Sep 24 * Tonights Tradeoff52Robert Finch
7 Sep 24 `* Re: Tonights Tradeoff51MitchAlsup1
8 Sep 24  `* Re: Tonights Tradeoff50Robert Finch
8 Sep 24   `* Re: Tonights Tradeoff49MitchAlsup1
10 Sep 24    `* Re: Tonights Tradeoff48Robert Finch
10 Sep 24     +* Re: Tonights Tradeoff17BGB
10 Sep 24     i+* Re: Tonights Tradeoff12Robert Finch
10 Sep 24     ii+* Re: Tonights Tradeoff10BGB
11 Sep 24     iii`* Re: Tonights Tradeoff9Robert Finch
11 Sep 24     iii +* Re: Tonights Tradeoff7Stephen Fuld
11 Sep 24     iii i+- Re: Tonights Tradeoff1MitchAlsup1
12 Sep 24     iii i`* Re: Tonights Tradeoff5Robert Finch
12 Sep 24     iii i `* Re: Tonights Tradeoff4MitchAlsup1
12 Sep 24     iii i  `* Re: Tonights Tradeoff3Robert Finch
12 Sep 24     iii i   `* Re: Tonights Tradeoff2MitchAlsup1
13 Sep 24     iii i    `- Re: Tonights Tradeoff1MitchAlsup1
12 Sep 24     iii `- Re: Tonights Tradeoff1BGB
11 Sep 24     ii`- Re: Tonights Tradeoff1MitchAlsup1
11 Sep 24     i`* Re: Tonights Tradeoff4MitchAlsup1
12 Sep 24     i `* Re: Tonights Tradeoff3Thomas Koenig
12 Sep 24     i  `* Re: Tonights Tradeoff2BGB
12 Sep 24     i   `- Re: Tonights Tradeoff1Robert Finch
11 Sep 24     `* Re: Tonights Tradeoff30MitchAlsup1
15 Sep 24      `* Re: Tonights Tradeoff29Robert Finch
16 Sep 24       `* Re: Tonights Tradeoff28Robert Finch
24 Sep 24        `* Re: Tonights Tradeoff - Background Execution Buffers27Robert Finch
24 Sep 24         `* Re: Tonights Tradeoff - Background Execution Buffers26MitchAlsup1
26 Sep 24          `* Re: Tonights Tradeoff - Background Execution Buffers25Robert Finch
26 Sep 24           `* Re: Tonights Tradeoff - Background Execution Buffers24MitchAlsup1
27 Sep 24            `* Re: Tonights Tradeoff - Background Execution Buffers23Robert Finch
4 Oct 24             `* Re: Tonights Tradeoff - Background Execution Buffers22Robert Finch
4 Oct 24              +* Re: Tonights Tradeoff - Background Execution Buffers19Anton Ertl
4 Oct 24              i`* Re: Tonights Tradeoff - Background Execution Buffers18Robert Finch
5 Oct 24              i `* Re: Tonights Tradeoff - Background Execution Buffers17Anton Ertl
9 Oct 24              i  `* Re: Tonights Tradeoff - Background Execution Buffers16Robert Finch
9 Oct 24              i   +* Re: Tonights Tradeoff - Background Execution Buffers3MitchAlsup1
9 Oct 24              i   i+- Re: Tonights Tradeoff - Background Execution Buffers1Robert Finch
12 Oct 24              i   i`- Re: Tonights Tradeoff - Background Execution Buffers1BGB
12 Oct 24              i   +* Re: Tonights Tradeoff - Carry and Overflow11Robert Finch
12 Oct 24              i   i`* Re: Tonights Tradeoff - Carry and Overflow10MitchAlsup1
12 Oct 24              i   i `* Re: Tonights Tradeoff - Carry and Overflow9BGB
13 Oct 24              i   i  `* Re: Tonights Tradeoff - Carry and Overflow8Robert Finch
13 Oct 24              i   i   +* Re: Tonights Tradeoff - Carry and Overflow3MitchAlsup1
13 Oct 24              i   i   i`* Re: Tonights Tradeoff - ATOM2Robert Finch
13 Oct 24              i   i   i `- Re: Tonights Tradeoff - ATOM1MitchAlsup1
13 Oct 24              i   i   +- Re: Tonights Tradeoff - Carry and Overflow1BGB
31 Oct 24              i   i   `* Page fetching cache controller3Robert Finch
31 Oct 24              i   i    +- Re: Page fetching cache controller1MitchAlsup1
6 Nov 24              i   i    `- Re: Q+ Fibonacci1Robert Finch
13 Oct 24              i   `- Re: Tonights Tradeoff - Background Execution Buffers1Anton Ertl
4 Oct 24              +- Re: Tonights Tradeoff - Background Execution Buffers1BGB
6 Oct 24              `- Re: Tonights Tradeoff - Background Execution Buffers1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal