Newsportal USENET - Re: Memory ordering

On Tue, 30 Jul 2024 9:51:46 +0000, Anton Ertl wrote:

mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 29 Jul 2024 13:21:10 +0000, Anton Ertl wrote:
A problem with that approach is that this requires enough reorder
buffering (or something equivalent, there may be something cheaper for
this particular problem) to cover at least the shared-cache latency
(usually L3, more with multiple sockets).
>
The depth of the execution window may be smaller than the time it takes
to send the required information around and have this core recognize
that it is out-of-order wrt memory.
>
So if we don't want to stall for memory accesses all the time, we need
a bigger execution window, either by making the reorder buffer larger,
or by using a different, cheaper mechanism.

Mc 88120 had a 96-wide execution window, which could be filled up in
16 cycles (optimistically) and was filled up in 32 cycles (average).
Given that DRAM is not going to be less than 20 ns and a 5GHz core,
the execution window is 1/3rd that which would be required to absorb
an cache miss all the way to DRAM. Add in 12-ish cycles for L2, 10-
cycles for transporting L2 miss to Memory controller, 5-cycles between
memory controller and DRAM controller-----and sooner or later it gets
hard. So nobody tries to make the execution window big enough to do
that (actually Mc88120 was to ECL buss technology and was only 100 MHz
so its puny 16-cycle EW actually was big enough to absorb a Cache
miss all the way to DRAM.....but that is for another day.)

Concerning the cheaper mechanism, what I am thinking of is hardware
checkpointing every, say, 200 cycles or so (subject to fine-tuning).
The idea here is that communication between cores is very rare, so
rolling back more cycles than the minimal necessary amount costs
little on average (except that it looks bad on cache ping-pong
microbenchmarks).

You lost me::
Colloquially, there are 2 uses of the word checkpointing:: a) what
HW does each time it inserts a branch into the EW, b) what an OS or
application does to be able to recover from a crash (from any
mechanism).
Neither is used to describe interactions between cores.
<snip>

The operations themselves are not slow.
>
Citation needed.
>
A MEMBAR dropped into the pipeline, when nothing is speculative, takes
no more time than an integer ADD. Only when there is speculation does
it have to take time to relax the speculation.
>
Not sure what kind of speculation you mean here. On in-order cores
like the non-Fujitsu SPARCs from before about 2010 memory barriers are
expensive AFAIK, even though there is essentially no branch
speculation on in-order cores.

You dropped 64 instructions into the EW, and AGEN performs 15 address
generations in the order permitted by operand arrival. These addresses
are routed to the L1s to determine who hits and who misses--all OoO.
Thus, the Addresses are only in Operand order and they touched the
caches
in operand order.
There could be branch mispredictions in EW also causing many of the
AGENs to get thrown away after the branch is discovered to be poorly
predicted.
And ono top of all of this several FP instructions may have raised
exceptions.
EW pipelines are generally designed to "sort all this stuff out at
retirement"; occasionally, memory ordering issues are sorted out
prior to retirement by replaying OoO memory references.

>
Of course, if you mean speculation about the order of loads and
stores, yes, if you don't have such speculation, the memory barriers
are fast, but then loads are extremely slow.

An MEMBAR requires the memory order to catch up to the current point
before adding new AGENs to the problem space. If the memory order
is already SC then MEMBAR has nothing to do and is pushed through
the pipeline without delay.
So, the delay has to do with catching up with memory order not in
pushing the MEMBAR through the pipeline.

>
Memory consistency is defined wrt what several processors do. Some
processor performs some reads and writes and another performs some
read and writes, and memory consistency defines what a processor sees
about what the other does, and what ends up in main memory. But as
long as the processors, their caches, and their interconnect gets the
memory ordering right, the main memory is just the backing store that
eventually gets a consistent result of what the other components did.
So it does not matter whether the main memory has one bank or 256.
>
NEC SX is a multi-processor vector machine with the property that
addresses are spewed out as fast as AGEN can perform. These addresses
are routed to banks based on bus-segment and can arrive OoO wrt
how they were spewed out.
>
So two processors accessing the same memory using vector LDs will
see a single vector having multiple memory orderings. P[0]V[0] ordered
before P[1]V[0] but P[1]V[1] ordered before P[0]V[1], ...
>
As long as no stores happen, who cares about the order of the loads?
When stores happen, the loads are ordered wrt these stores (with
stronger memory orderings giving more guarantees). So the number of
memory banks does not matter for implementing a strong ordering
efficiently.

Then consider 2 Vector processors performing 2 STs (1 each) to
non-overlapping addresses but with bank aliasing. Consider that
the STs are scatter based and the back conflicts random. There
is no way to determine which store happened first or which
element of each vector store happened first.

- anton

Date	Sujet	#	Auteur
24 Jul 24	Arguments for a sane ISA 6-years later	63	MitchAlsup1
25 Jul 24	Re: Arguments for a sane ISA 6-years later	62	BGB
25 Jul 24	Re: Arguments for a sane ISA 6-years later	57	Chris M. Thomasson
26 Jul 24	Re: Arguments for a sane ISA 6-years later	56	Anton Ertl
26 Jul 24	Re: Arguments for a sane ISA 6-years later	20	BGB
29 Jul 24	Re: Arguments for a sane ISA 6-years later	19	Anton Ertl
29 Jul 24	Intel overvoltage (was: Arguments for a sane ISA 6-years later)	2	Thomas Koenig
29 Jul 24	Re: Intel overvoltage	1	BGB
29 Jul 24	Re: Arguments for a sane ISA 6-years later	16	BGB
30 Jul 24	Re: Arguments for a sane ISA 6-years later	15	Anton Ertl
30 Jul 24	Re: Arguments for a sane ISA 6-years later	14	BGB
30 Jul 24	Re: Arguments for a sane ISA 6-years later	2	Chris M. Thomasson
30 Jul 24	Re: Arguments for a sane ISA 6-years later	1	BGB
1 Aug 24	Re: Arguments for a sane ISA 6-years later	11	Anton Ertl
1 Aug 24	Re: Arguments for a sane ISA 6-years later	1	Michael S
1 Aug 24	Re: Arguments for a sane ISA 6-years later	8	MitchAlsup1
1 Aug 24	Re: Arguments for a sane ISA 6-years later	1	Michael S
2 Aug 24	Re: Arguments for a sane ISA 6-years later	6	MitchAlsup1
2 Aug 24	Re: Arguments for a sane ISA 6-years later	1	Michael S
4 Aug 24	Re: Arguments for a sane ISA 6-years later	4	MitchAlsup1
5 Aug 24	Re: Arguments for a sane ISA 6-years later	3	Stephen Fuld
5 Aug 24	Re: Arguments for a sane ISA 6-years later	2	Stephen Fuld
5 Aug 24	Re: Arguments for a sane ISA 6-years later	1	MitchAlsup1
1 Aug 24	Re: Arguments for a sane ISA 6-years later	1	BGB
26 Jul 24	Re: Arguments for a sane ISA 6-years later	20	MitchAlsup1
27 Jul 24	Re: Arguments for a sane ISA 6-years later	1	BGB
29 Jul 24	Memory ordering (was: Arguments for a sane ISA 6-years later)	18	Anton Ertl
29 Jul 24	Re: Memory ordering	15	MitchAlsup1
29 Jul 24	Re: Memory ordering	6	Chris M. Thomasson
29 Jul 24	Re: Memory ordering	5	MitchAlsup1
30 Jul 24	Re: Memory ordering	4	Michael S
31 Jul 24	Re: Memory ordering	3	Chris M. Thomasson
31 Jul 24	Re: Memory ordering	2	Chris M. Thomasson
31 Jul 24	Re: Memory ordering	1	Chris M. Thomasson
30 Jul 24	Re: Memory ordering	8	Anton Ertl
30 Jul 24	Re: Memory ordering	2	Chris M. Thomasson
30 Jul 24	Re: Memory ordering	1	Chris M. Thomasson
31 Jul 24	Re: Memory ordering	5	MitchAlsup1
31 Jul 24	Re: Memory ordering	1	Chris M. Thomasson
1 Aug 24	Re: Memory ordering	3	Anton Ertl
1 Aug 24	Re: Memory ordering	2	MitchAlsup1
2 Aug 24	Re: Memory ordering	1	Anton Ertl
29 Jul 24	Re: Memory ordering	2	Chris M. Thomasson
30 Jul 24	Re: Memory ordering	1	Chris M. Thomasson
29 Jul 24	Re: Arguments for a sane ISA 6-years later	13	Chris M. Thomasson
29 Jul 24	Re: Arguments for a sane ISA 6-years later	9	BGB
29 Jul 24	Re: Arguments for a sane ISA 6-years later	8	Chris M. Thomasson
29 Jul 24	Re: Arguments for a sane ISA 6-years later	1	Chris M. Thomasson
29 Jul 24	Re: Arguments for a sane ISA 6-years later	2	BGB
29 Jul 24	Re: Arguments for a sane ISA 6-years later	1	Chris M. Thomasson
30 Jul 24	Re: Arguments for a sane ISA 6-years later	4	jseigh
30 Jul 24	Re: Arguments for a sane ISA 6-years later	3	Chris M. Thomasson
31 Jul 24	Re: Arguments for a sane ISA 6-years later	2	jseigh
31 Jul 24	Re: Arguments for a sane ISA 6-years later	1	Chris M. Thomasson
29 Jul 24	Memory ordering (was: Arguments for a sane ISA 6-years later)	1	Anton Ertl
29 Jul 24	Re: Arguments for a sane ISA 6-years later	2	MitchAlsup1
29 Jul 24	Re: Arguments for a sane ISA 6-years later	1	BGB
6 Aug 24	Re: Arguments for a sane ISA 6-years later	2	Chris M. Thomasson
6 Aug 24	Re: Arguments for a sane ISA 6-years later	1	Chris M. Thomasson
25 Jul 24	Re: Arguments for a sane ISA 6-years later	4	MitchAlsup1
26 Jul 24	Re: Arguments for a sane ISA 6-years later	1	BGB
28 Jul 24	Re: Arguments for a sane ISA 6-years later	2	Paul A. Clayton
28 Jul 24	Re: Arguments for a sane ISA 6-years later	1	MitchAlsup1