Liste des Groupes | Revenir à c arch |
In article <v04tpb$pqus$1@dont-email.me>,OK, you did state that A1 depends on D0, but then showed a bit later that neither A nor D depended on C, so you could use that as a filler.
Terje Mathisen <terje.mathisen@tmsw.no> wrote:MitchAlsup1 wrote:This is a late reply, but optimal static ordering for N-wide may beBGB wrote:>
>On 4/20/2024 5:03 PM, MitchAlsup1 wrote:>
Like, in-order superscalar isn't going to do crap if nearly every
instruction depends on every preceding instruction. Even pipelining
can't help much with this.
Pipelining CREATED this (back to back dependencies). No amount of
pipelining can eradicate RAW data dependencies.
>The compiler can shuffle the instructions into an order to limit the>
number of register dependencies and better fit the pipeline. But,
then, most of the "hard parts" are already done (so it doesn't take
much more for the compiler to flag which instructions can run in
parallel).
Compiler scheduling works for exactly 1 pipeline implementation and
is suboptimal for all others.
Well, yeah.
>
OTOH, if your (definitely not my!) compiler can schedule a 4-wide static
ordering of operations, then it will be very nearly optimal on 2-wide
and 3-wide as well. (The difference is typically in a bit more loop
setup and cleanup code than needed.)
>
Hand-optimizing Pentium asm code did teach me to "think like a cpu",
which is probably the only part of the experience which is still kind of
relevant. :-)
>
Terje
>
-- - <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
very non-optimal for N-1 (or N-2, etc.). As an example, assume a perfectly
scheduled 4-wide sequence of instructions with the instructions labeled
with the group number, and letter A-D for the position in the group.
There is a dependency from A to A, B to B, etc., and a dependency from D
to A. Here's what the instruction groupings look like on a 4-way machine:
INST0_A
INST0_B
INST0_C
INST0_D
-------
INST1_A
INST1_B
INST1_C
INST1_D
-------
INST2_A
There will obviously be other dependencies (say, INST2_A depends on INST0_B)
but they don't affect how this will be executed.
The ----- lines indicate group boundaries. All instructions in a group
execute in the same cycle. So the first 8 instruction take just 2 clocks
on a 4-wide.
If you run this sequence on a 3-wide, then the groupings will become:
INST0_A
INST0_B
INST0_C
-------
INST0_D
-------
INST1_A
INST1_B
INST1_C
-------
INST1_D
-------
INST0_AObviously you cannot follow this up with INST2_A, you would need INST2B here and then A/C/D on the next cycle
INST0_B
INST0_D
-------
INST1_A
INST0_C
INST1_B
-------
INST1_C
INST1_D
Les messages affichés proviennent d'usenet.