Re: Memory ordering

Liste des GroupesRevenir à c arch 
Sujet : Re: Memory ordering
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.arch
Date : 01. Aug 2024, 16:54:55
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2024Aug1.175455@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6 7 8 9
User-Agent : xrn 10.11
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 30 Jul 2024 9:51:46 +0000, Anton Ertl wrote:
>
mitchalsup@aol.com (MitchAlsup1) writes:
The depth of the execution window may be smaller than the time it takes
to send the required information around and have this core recognize
that it is out-of-order wrt memory.
>
So if we don't want to stall for memory accesses all the time, we need
a bigger execution window, either by making the reorder buffer larger,
or by using a different, cheaper mechanism.
>
Mc 88120 had a 96-wide execution window, which could be filled up in
16 cycles (optimistically) and was filled up in 32 cycles (average).
Given that DRAM is not going to be less than 20 ns and a 5GHz core,
the execution window is 1/3rd that which would be required to absorb
an cache miss all the way to DRAM.

Relevant numbers for current cores are 400-600 instructions in the
reorder buffer, 6-8 instructions per cycle, and the core-to-core
latency is (for Bergamo) 30-40ns (90-120 cycles) within a CCX,
100-120ns (300-360 cyles) within a socket, with a 212ns (636 cycles)
worst case across sockets (data from
<https://chipsandcheese.com/2024/06/22/testing-amds-bergamo-zen-4c-spam/>);
I computed with 3GHz clock rate, which fits Bergamo.  On the fast
desktop chips the clock rate is higher, but the latency is lower in
both ns and cycles (in particular, no dual-socket penalty); e.g., on
the Ryzen 7950X the core-to-core latency is <80ns (456 cycles)
<https://images.anandtech.com/doci/17585/AMD%20Ryzen%209%207950X%20Core%20to%20Core%20Latency%20Final.jpg>.

The reorder buffers (and integer register files and store buffers)
would need to be even larger to cover the 456 or 636 cycles (and maybe
even more than that latency is needed before you are sure that a load
or store is sequentially consistent), or alternatively one would need
a cheaper mechanism.

Concerning the cheaper mechanism, what I am thinking of is hardware
checkpointing every, say, 200 cycles or so (subject to fine-tuning).
The idea here is that communication between cores is very rare, so
rolling back more cycles than the minimal necessary amount costs
little on average (except that it looks bad on cache ping-pong
microbenchmarks).
>
You lost me::
>
Colloquially, there are 2 uses of the word checkpointing:: a) what
HW does each time it inserts a branch into the EW, b) what an OS or
application does to be able to recover from a crash (from any
mechanism).

What is "EW"?

Anyway, here checkpointing would be a hardware mechanism that allows
rolling back to the state at the point when the checkpoint was made.

An MEMBAR requires the memory order to catch up to the current point
before adding new AGENs to the problem space. If the memory order
is already SC then MEMBAR has nothing to do and is pushed through
the pipeline without delay.

Yes, that's the slow implementation.  The fast implementation is to
implement sequential consistency all the time (by predicting and
speculating that memory accesses do not interfer with those of other
cores, and recovering from that speculation when the speculation turns
out to be wrong).  In such an implementation memory barriers are noops
(and thus fast), because the hardware already provides sequential
consistency.

Then consider 2 Vector processors performing 2 STs (1 each) to
non-overlapping addresses but with bank aliasing. Consider that
the STs are scatter based and the back conflicts random. There
is no way to determine which store happened first or which
element of each vector store happened first.

It's up to the architecture to define the order of stores and loads of
a given core.  For sequential consistency you then interleave the
sequences coming from the cores in some convenient order.  It does not
matter what happens earlier in some inertial system.  It only matters
what your hardware decides should be treated as being earlier.  The
hardware has a lot of freedom here, but the end result as visible to
the cores must be sequentially consistent (or, with a weaker memory
consistency model, consistent with that model).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Date Sujet#  Auteur
24 Jul 24 * Arguments for a sane ISA 6-years later63MitchAlsup1
25 Jul 24 `* Re: Arguments for a sane ISA 6-years later62BGB
25 Jul 24  +* Re: Arguments for a sane ISA 6-years later57Chris M. Thomasson
26 Jul 24  i`* Re: Arguments for a sane ISA 6-years later56Anton Ertl
26 Jul 24  i +* Re: Arguments for a sane ISA 6-years later20BGB
29 Jul 24  i i`* Re: Arguments for a sane ISA 6-years later19Anton Ertl
29 Jul 24  i i +* Intel overvoltage (was: Arguments for a sane ISA 6-years later)2Thomas Koenig
29 Jul 24  i i i`- Re: Intel overvoltage1BGB
29 Jul 24  i i `* Re: Arguments for a sane ISA 6-years later16BGB
30 Jul 24  i i  `* Re: Arguments for a sane ISA 6-years later15Anton Ertl
30 Jul 24  i i   `* Re: Arguments for a sane ISA 6-years later14BGB
30 Jul 24  i i    +* Re: Arguments for a sane ISA 6-years later2Chris M. Thomasson
31 Jul 24  i i    i`- Re: Arguments for a sane ISA 6-years later1BGB
1 Aug 24  i i    `* Re: Arguments for a sane ISA 6-years later11Anton Ertl
1 Aug 24  i i     +- Re: Arguments for a sane ISA 6-years later1Michael S
1 Aug 24  i i     +* Re: Arguments for a sane ISA 6-years later8MitchAlsup1
1 Aug 24  i i     i+- Re: Arguments for a sane ISA 6-years later1Michael S
2 Aug 24  i i     i`* Re: Arguments for a sane ISA 6-years later6MitchAlsup1
2 Aug 24  i i     i +- Re: Arguments for a sane ISA 6-years later1Michael S
4 Aug 24  i i     i `* Re: Arguments for a sane ISA 6-years later4MitchAlsup1
5 Aug 24  i i     i  `* Re: Arguments for a sane ISA 6-years later3Stephen Fuld
5 Aug 24  i i     i   `* Re: Arguments for a sane ISA 6-years later2Stephen Fuld
5 Aug 24  i i     i    `- Re: Arguments for a sane ISA 6-years later1MitchAlsup1
1 Aug 24  i i     `- Re: Arguments for a sane ISA 6-years later1BGB
26 Jul 24  i +* Re: Arguments for a sane ISA 6-years later20MitchAlsup1
27 Jul 24  i i+- Re: Arguments for a sane ISA 6-years later1BGB
29 Jul 24  i i`* Memory ordering (was: Arguments for a sane ISA 6-years later)18Anton Ertl
29 Jul 24  i i +* Re: Memory ordering15MitchAlsup1
29 Jul 24  i i i+* Re: Memory ordering6Chris M. Thomasson
29 Jul 24  i i ii`* Re: Memory ordering5MitchAlsup1
30 Jul 24  i i ii `* Re: Memory ordering4Michael S
31 Jul 24  i i ii  `* Re: Memory ordering3Chris M. Thomasson
31 Jul 24  i i ii   `* Re: Memory ordering2Chris M. Thomasson
31 Jul 24  i i ii    `- Re: Memory ordering1Chris M. Thomasson
30 Jul 24  i i i`* Re: Memory ordering8Anton Ertl
30 Jul 24  i i i +* Re: Memory ordering2Chris M. Thomasson
30 Jul 24  i i i i`- Re: Memory ordering1Chris M. Thomasson
31 Jul 24  i i i `* Re: Memory ordering5MitchAlsup1
31 Jul 24  i i i  +- Re: Memory ordering1Chris M. Thomasson
1 Aug 24  i i i  `* Re: Memory ordering3Anton Ertl
1 Aug 24  i i i   `* Re: Memory ordering2MitchAlsup1
2 Aug 24  i i i    `- Re: Memory ordering1Anton Ertl
29 Jul 24  i i `* Re: Memory ordering2Chris M. Thomasson
30 Jul 24  i i  `- Re: Memory ordering1Chris M. Thomasson
29 Jul 24  i +* Re: Arguments for a sane ISA 6-years later13Chris M. Thomasson
29 Jul 24  i i+* Re: Arguments for a sane ISA 6-years later9BGB
29 Jul 24  i ii`* Re: Arguments for a sane ISA 6-years later8Chris M. Thomasson
29 Jul 24  i ii +- Re: Arguments for a sane ISA 6-years later1Chris M. Thomasson
29 Jul 24  i ii +* Re: Arguments for a sane ISA 6-years later2BGB
29 Jul 24  i ii i`- Re: Arguments for a sane ISA 6-years later1Chris M. Thomasson
30 Jul 24  i ii `* Re: Arguments for a sane ISA 6-years later4jseigh
30 Jul 24  i ii  `* Re: Arguments for a sane ISA 6-years later3Chris M. Thomasson
31 Jul 24  i ii   `* Re: Arguments for a sane ISA 6-years later2jseigh
31 Jul 24  i ii    `- Re: Arguments for a sane ISA 6-years later1Chris M. Thomasson
29 Jul 24  i i+- Memory ordering (was: Arguments for a sane ISA 6-years later)1Anton Ertl
29 Jul 24  i i`* Re: Arguments for a sane ISA 6-years later2MitchAlsup1
29 Jul 24  i i `- Re: Arguments for a sane ISA 6-years later1BGB
6 Aug 24  i `* Re: Arguments for a sane ISA 6-years later2Chris M. Thomasson
6 Aug 24  i  `- Re: Arguments for a sane ISA 6-years later1Chris M. Thomasson
26 Jul 24  `* Re: Arguments for a sane ISA 6-years later4MitchAlsup1
27 Jul 24   +- Re: Arguments for a sane ISA 6-years later1BGB
28 Jul 24   `* Re: Arguments for a sane ISA 6-years later2Paul A. Clayton
28 Jul 24    `- Re: Arguments for a sane ISA 6-years later1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal