Re: Memory ordering

Liste des GroupesRevenir à c arch 
Sujet : Re: Memory ordering
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.arch
Date : 31. Jul 2024, 01:27:11
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <249b2217b1dc1c8911eb45c5735d4aa9@www.novabbs.org>
References : 1 2 3 4 5 6 7 8
User-Agent : Rocksolid Light
On Tue, 30 Jul 2024 9:51:46 +0000, Anton Ertl wrote:

mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 29 Jul 2024 13:21:10 +0000, Anton Ertl wrote:
A problem with that approach is that this requires enough reorder
buffering (or something equivalent, there may be something cheaper for
this particular problem) to cover at least the shared-cache latency
(usually L3, more with multiple sockets).
>
The depth of the execution window may be smaller than the time it takes
to send the required information around and have this core recognize
that it is out-of-order wrt memory.
>
So if we don't want to stall for memory accesses all the time, we need
a bigger execution window, either by making the reorder buffer larger,
or by using a different, cheaper mechanism.
Mc 88120 had a 96-wide execution window, which could be filled up in
16 cycles (optimistically) and was filled up in 32 cycles (average).
Given that DRAM is not going to be less than 20 ns and a 5GHz core,
the execution window is 1/3rd that which would be required to absorb
an cache miss all the way to DRAM. Add in 12-ish cycles for L2, 10-
cycles for transporting L2 miss to Memory controller, 5-cycles between
memory controller and DRAM controller-----and sooner or later it gets
hard. So nobody tries to make the execution window big enough to do
that (actually Mc88120 was to ECL buss technology and was only 100 MHz
so its puny 16-cycle EW actually was big enough to absorb a Cache
miss all the way to DRAM.....but that is for another day.)

Concerning the cheaper mechanism, what I am thinking of is hardware
checkpointing every, say, 200 cycles or so (subject to fine-tuning).
The idea here is that communication between cores is very rare, so
rolling back more cycles than the minimal necessary amount costs
little on average (except that it looks bad on cache ping-pong
microbenchmarks).
You lost me::
Colloquially, there are 2 uses of the word checkpointing:: a) what
HW does each time it inserts a branch into the EW, b) what an OS or
application does to be able to recover from a crash (from any
mechanism).
Neither is used to describe interactions between cores.
<snip>

The operations themselves are not slow.
>
Citation needed.
>
A MEMBAR dropped into the pipeline, when nothing is speculative, takes
no more time than an integer ADD. Only when there is speculation does
it have to take time to relax the speculation.
>
Not sure what kind of speculation you mean here.  On in-order cores
like the non-Fujitsu SPARCs from before about 2010 memory barriers are
expensive AFAIK, even though there is essentially no branch
speculation on in-order cores.
You dropped 64 instructions into the EW, and AGEN performs 15 address
generations in the order permitted by operand arrival. These addresses
are routed to the L1s to determine who hits and who misses--all OoO.
Thus, the Addresses are only in Operand order and they touched the
caches
in operand order.
There could be branch mispredictions in EW also causing many of the
AGENs to get thrown away after the branch is discovered to be poorly
predicted.
And ono top of all of this several FP instructions may have raised
exceptions.
EW pipelines are generally designed to "sort all this stuff out at
retirement"; occasionally, memory ordering issues are sorted out
prior to retirement by replaying OoO memory references.
>
Of course, if you mean speculation about the order of loads and
stores, yes, if you don't have such speculation, the memory barriers
are fast, but then loads are extremely slow.
An MEMBAR requires the memory order to catch up to the current point
before adding new AGENs to the problem space. If the memory order
is already SC then MEMBAR has nothing to do and is pushed through
the pipeline without delay.
So, the delay has to do with catching up with memory order not in
pushing the MEMBAR through the pipeline.

>
Memory consistency is defined wrt what several processors do.  Some
processor performs some reads and writes and another performs some
read and writes, and memory consistency defines what a processor sees
about what the other does, and what ends up in main memory.  But as
long as the processors, their caches, and their interconnect gets the
memory ordering right, the main memory is just the backing store that
eventually gets a consistent result of what the other components did.
So it does not matter whether the main memory has one bank or 256.
>
NEC SX is a multi-processor vector machine with the property that
addresses are spewed out as fast as AGEN can perform. These addresses
are routed to banks based on bus-segment and can arrive OoO wrt
how they were spewed out.
>
So two processors accessing the same memory using vector LDs will
see a single vector having multiple memory orderings. P[0]V[0] ordered
before P[1]V[0] but P[1]V[1] ordered before P[0]V[1], ...
>
As long as no stores happen, who cares about the order of the loads?
When stores happen, the loads are ordered wrt these stores (with
stronger memory orderings giving more guarantees).  So the number of
memory banks does not matter for implementing a strong ordering
efficiently.
Then consider 2 Vector processors performing 2 STs (1 each) to
non-overlapping addresses but with bank aliasing. Consider that
the STs are scatter based and the back conflicts random. There
is no way to determine which store happened first or which
element of each vector store happened first.

- anton

Date Sujet#  Auteur
24 Jul 24 * Arguments for a sane ISA 6-years later63MitchAlsup1
25 Jul 24 `* Re: Arguments for a sane ISA 6-years later62BGB
25 Jul 24  +* Re: Arguments for a sane ISA 6-years later57Chris M. Thomasson
26 Jul 24  i`* Re: Arguments for a sane ISA 6-years later56Anton Ertl
26 Jul 24  i +* Re: Arguments for a sane ISA 6-years later20BGB
29 Jul 24  i i`* Re: Arguments for a sane ISA 6-years later19Anton Ertl
29 Jul 24  i i +* Intel overvoltage (was: Arguments for a sane ISA 6-years later)2Thomas Koenig
29 Jul 24  i i i`- Re: Intel overvoltage1BGB
29 Jul 24  i i `* Re: Arguments for a sane ISA 6-years later16BGB
30 Jul 24  i i  `* Re: Arguments for a sane ISA 6-years later15Anton Ertl
30 Jul 24  i i   `* Re: Arguments for a sane ISA 6-years later14BGB
30 Jul 24  i i    +* Re: Arguments for a sane ISA 6-years later2Chris M. Thomasson
31 Jul 24  i i    i`- Re: Arguments for a sane ISA 6-years later1BGB
1 Aug 24  i i    `* Re: Arguments for a sane ISA 6-years later11Anton Ertl
1 Aug 24  i i     +- Re: Arguments for a sane ISA 6-years later1Michael S
1 Aug 24  i i     +* Re: Arguments for a sane ISA 6-years later8MitchAlsup1
1 Aug 24  i i     i+- Re: Arguments for a sane ISA 6-years later1Michael S
2 Aug 24  i i     i`* Re: Arguments for a sane ISA 6-years later6MitchAlsup1
2 Aug 24  i i     i +- Re: Arguments for a sane ISA 6-years later1Michael S
4 Aug 24  i i     i `* Re: Arguments for a sane ISA 6-years later4MitchAlsup1
5 Aug 24  i i     i  `* Re: Arguments for a sane ISA 6-years later3Stephen Fuld
5 Aug 24  i i     i   `* Re: Arguments for a sane ISA 6-years later2Stephen Fuld
5 Aug 24  i i     i    `- Re: Arguments for a sane ISA 6-years later1MitchAlsup1
1 Aug 24  i i     `- Re: Arguments for a sane ISA 6-years later1BGB
26 Jul 24  i +* Re: Arguments for a sane ISA 6-years later20MitchAlsup1
27 Jul 24  i i+- Re: Arguments for a sane ISA 6-years later1BGB
29 Jul 24  i i`* Memory ordering (was: Arguments for a sane ISA 6-years later)18Anton Ertl
29 Jul 24  i i +* Re: Memory ordering15MitchAlsup1
29 Jul 24  i i i+* Re: Memory ordering6Chris M. Thomasson
29 Jul 24  i i ii`* Re: Memory ordering5MitchAlsup1
30 Jul 24  i i ii `* Re: Memory ordering4Michael S
31 Jul 24  i i ii  `* Re: Memory ordering3Chris M. Thomasson
31 Jul 24  i i ii   `* Re: Memory ordering2Chris M. Thomasson
31 Jul 24  i i ii    `- Re: Memory ordering1Chris M. Thomasson
30 Jul 24  i i i`* Re: Memory ordering8Anton Ertl
30 Jul 24  i i i +* Re: Memory ordering2Chris M. Thomasson
30 Jul 24  i i i i`- Re: Memory ordering1Chris M. Thomasson
31 Jul 24  i i i `* Re: Memory ordering5MitchAlsup1
31 Jul 24  i i i  +- Re: Memory ordering1Chris M. Thomasson
1 Aug 24  i i i  `* Re: Memory ordering3Anton Ertl
1 Aug 24  i i i   `* Re: Memory ordering2MitchAlsup1
2 Aug 24  i i i    `- Re: Memory ordering1Anton Ertl
29 Jul 24  i i `* Re: Memory ordering2Chris M. Thomasson
30 Jul 24  i i  `- Re: Memory ordering1Chris M. Thomasson
29 Jul 24  i +* Re: Arguments for a sane ISA 6-years later13Chris M. Thomasson
29 Jul 24  i i+* Re: Arguments for a sane ISA 6-years later9BGB
29 Jul 24  i ii`* Re: Arguments for a sane ISA 6-years later8Chris M. Thomasson
29 Jul 24  i ii +- Re: Arguments for a sane ISA 6-years later1Chris M. Thomasson
29 Jul 24  i ii +* Re: Arguments for a sane ISA 6-years later2BGB
29 Jul 24  i ii i`- Re: Arguments for a sane ISA 6-years later1Chris M. Thomasson
30 Jul 24  i ii `* Re: Arguments for a sane ISA 6-years later4jseigh
30 Jul 24  i ii  `* Re: Arguments for a sane ISA 6-years later3Chris M. Thomasson
31 Jul 24  i ii   `* Re: Arguments for a sane ISA 6-years later2jseigh
31 Jul 24  i ii    `- Re: Arguments for a sane ISA 6-years later1Chris M. Thomasson
29 Jul 24  i i+- Memory ordering (was: Arguments for a sane ISA 6-years later)1Anton Ertl
29 Jul 24  i i`* Re: Arguments for a sane ISA 6-years later2MitchAlsup1
29 Jul 24  i i `- Re: Arguments for a sane ISA 6-years later1BGB
6 Aug 24  i `* Re: Arguments for a sane ISA 6-years later2Chris M. Thomasson
6 Aug 24  i  `- Re: Arguments for a sane ISA 6-years later1Chris M. Thomasson
26 Jul 24  `* Re: Arguments for a sane ISA 6-years later4MitchAlsup1
27 Jul 24   +- Re: Arguments for a sane ISA 6-years later1BGB
28 Jul 24   `* Re: Arguments for a sane ISA 6-years later2Paul A. Clayton
28 Jul 24    `- Re: Arguments for a sane ISA 6-years later1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal