Re: auto predicating branches

Liste des GroupesRevenir à c arch 
Sujet : Re: auto predicating branches
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.arch
Date : 23. Apr 2025, 22:34:51
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <126700f99b6f97d7483bb5355d68c361@www.novabbs.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12
User-Agent : Rocksolid Light
On Wed, 23 Apr 2025 17:44:56 +0000, Anton Ertl wrote:

mitchalsup@aol.com (MitchAlsup1) writes:
I do not see 2 LDDs being performed parallel unless the execution
width is at least 14-wide. In any event loop recurrence restricts the
overall retirement to 0.5 LDDs per cycle--it is the recurrence that
feeds the iterations (i.e., retirement).
>
Yes.  But with loads that take longer than two cycles (very common in
OoO microarchitectures even for L1 hits), the second load starts
before the first finishes.  And in the case where the branchy version
is profitable (when the load latency longer than the misprediction
penalty due to cache misses), many loads will start before the first
finishes (most of them will be canceled due to misprediction, but even
an average of two useful parallel loads produces a good speedup).
We still have a 2 cycle loop recurrence, so even if we could perform
1000
LDs per cycle, we are fundamentally SRA and SUB bound around the loop
from iteration to iteration.

[EricP:]
[*] I want to see the asm because Intel's CMOV always executes the
operand operation, then tosses the result if the predicate is false.
>
Use a less-stupid ISA
>
The ISA does not require that.  It could just as well be implemented
as waiting for the condition, and only then perform the operation.
And with a more sophisticated implementation one could even do that
for operations that are not part of the CMOV instruction, but produce
one of the source operands of the CMOV instruction.  However,
apparently such implementations have enough disadvantages (probably in
performance) that nobody has gone there AFAIK.  AFAIK everyone,
including implementations of different ISAs implements
CMOV/predication as performing the operation and then conditionally
squashing the result.
That is difficult with renaming. In order for the later instructions
to wait on the CMOV renamed result register or the earlier predicted
value, each entry in each station has to be able to wait on one or
the other. In general, it is far easier to make CMOV be able to deliver
either result so nothing downstream of CMOV has any added complexity.

>
- anton

Date Sujet#  Auteur
6 Nov 24 * Re: Q+ Fibonacci57Robert Finch
17 Apr 25 `* Re: register sets56Robert Finch
17 Apr 25  +* Re: register sets53Stephen Fuld
17 Apr 25  i+- Re: register sets1Robert Finch
17 Apr 25  i+* Re: register sets46MitchAlsup1
18 Apr 25  ii`* Re: register sets45Robert Finch
18 Apr 25  ii `* Re: register sets44MitchAlsup1
20 Apr 25  ii  `* Re: register sets43Robert Finch
21 Apr 25  ii   `* Re: auto predicating branches42Robert Finch
21 Apr 25  ii    `* Re: auto predicating branches41Anton Ertl
21 Apr 25  ii     +- Is an instruction on the critical path? (was: auto predicating branches)1Anton Ertl
21 Apr 25  ii     `* Re: auto predicating branches39MitchAlsup1
22 Apr 25  ii      `* Re: auto predicating branches38Anton Ertl
22 Apr 25  ii       +- Re: auto predicating branches1MitchAlsup1
22 Apr 25  ii       `* Re: auto predicating branches36Anton Ertl
22 Apr 25  ii        `* Re: auto predicating branches35MitchAlsup1
23 Apr 25  ii         +* Re: auto predicating branches3Stefan Monnier
23 Apr 25  ii         i`* Re: auto predicating branches2Anton Ertl
25 Apr 25  ii         i `- Re: auto predicating branches1MitchAlsup1
23 Apr 25  ii         `* Re: auto predicating branches31Anton Ertl
23 Apr 25  ii          `* Re: auto predicating branches30MitchAlsup1
24 Apr 25  ii           `* Re: asynch register rename29Robert Finch
27 Apr 25  ii            `* Re: fractional PCs28Robert Finch
27 Apr 25  ii             `* Re: fractional PCs27MitchAlsup1
28 Apr 25  ii              `* Re: fractional PCs26Robert Finch
28 Apr 25  ii               +* Re: fractional PCs15MitchAlsup1
29 Apr 25  ii               i`* Re: fractional PCs14Robert Finch
5 May 25  ii               i `* Re: control co-processor13Robert Finch
5 May 25  ii               i  `* Re: control co-processor12Al Kossow
5 May 25  ii               i   `* Re: control co-processor11Stefan Monnier
6 May 25  ii               i    +* Re: control co-processor3MitchAlsup1
7 May 25  ii               i    i+- Re: control co-processor1MitchAlsup1
15 Jul 25  ii               i    i`- Re: control co-processor1MitchAlsup1
7 May 25  ii               i    `* Scan chains (was: control co-processor)7Stefan Monnier
7 May 25  ii               i     +* Re: Scan chains (was: control co-processor)2Al Kossow
7 May 25  ii               i     i`- Re: Scan chains1Stefan Monnier
7 May 25  ii               i     +* Re: Scan chains3MitchAlsup1
7 May 25  ii               i     i`* Re: Scan chains2Stefan Monnier
8 May 25  ii               i     i `- Re: Scan chains1MitchAlsup1
15 Jul 25  ii               i     `- Re: Scan chains1MitchAlsup1
29 Apr 25  ii               `* Re: fractional PCs10Robert Finch
29 Apr 25  ii                `* Re: fractional PCs9MitchAlsup1
30 Apr 25  ii                 `* Re: fractional PCs8Robert Finch
30 Apr 25  ii                  +* Re: fractional PCs6Thomas Koenig
1 May 25  ii                  i+- Re: fractional PCs1Robert Finch
2 May 25  ii                  i`* Re: fractional PCs4moi
2 May 25  ii                  i +* Re: millicode, extracode, fractional PCs2John Levine
2 May 25  ii                  i i`- Re: millicode, extracode, fractional PCs1moi
2 May 25  ii                  i `- Re: fractional PCs1moi
30 Apr 25  ii                  `- Re: fractional PCs1MitchAlsup1
15 Jul 25  i`* Re: register sets5John Savard
15 Jul 25  i `* Re: register sets4MitchAlsup1
19 Jul 25  i  `* Re: register sets3Robert Finch
19 Jul 25  i   `* Re: register sets2Anton Ertl
19 Jul 25  i    `- Re: register sets1MitchAlsup1
15 Jul 25  `* Re: register sets2John Savard
15 Jul 25   `- Re: register sets1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal