Sujet : Re: Memory ordering
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.archDate : 02. Aug 2024, 09:14:21
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2024Aug2.101421@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6 7 8 9 10
User-Agent : xrn 10.11
mitchalsup@aol.com (MitchAlsup1) writes:
On Thu, 1 Aug 2024 15:54:55 +0000, Anton Ertl wrote:
>
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 30 Jul 2024 9:51:46 +0000, Anton Ertl wrote:
>
mitchalsup@aol.com (MitchAlsup1) writes:
>
An MEMBAR requires the memory order to catch up to the current point
before adding new AGENs to the problem space. If the memory order
is already SC then MEMBAR has nothing to do and is pushed through
the pipeline without delay.
>
Yes, that's the slow implementation. The fast implementation is to
implement sequential consistency all the time (by predicting and
speculating that memory accesses do not interfer with those of other
cores, and recovering from that speculation when the speculation turns
out to be wrong). In such an implementation memory barriers are noops
(and thus fast), because the hardware already provides sequential
consistency.
>
Why does SC need any MEMBARs ??
A program written for sequential consistency does not need them. But
if you have a program written for a weaker memory model, the memory
barriers in that program will be noops and therefore really cheap.
Then consider 2 Vector processors performing 2 STs (1 each) to
non-overlapping addresses but with bank aliasing. Consider that
the STs are scatter based and the back conflicts random. There
is no way to determine which store happened first or which
element of each vector store happened first.
>
It's up to the architecture to define the order of stores and loads of
a given core. For sequential consistency you then interleave the
sequences coming from the cores in some convenient order.
>
Insufficient:: If OoO processor orders LDs and STs as they leave AGEN
you cannot just interleave multiple core access streams and achieve
sequential consistency.
Architecture is defined in the architecture manual. Implementation
concepts like OoO and AGEN don't (or shouldn't) play a role there.
WRT memory ordering most architectures define clearly what happens
(for single-threaded programs), i.e., loads and stores happen exactly
in the architectural execution order of the instructions, and they
actually implement that, for single threaded programs.
Then they take back some of these guarantees for multi-processing, and
add some instructions (memory barriers, lock prefixes, etc.) to
reestablish these guarantees when needed, in an expensive way.
Sequential consistency is what you get if you do not take back these
guarantees.
Concerning vector instructions, what do architectures say about the
memory order here? An ideal would be if they were treated as atomic,
i.e., a read access is all performed after any earlier and before any
later memory access in the stream of executed instructions. But even
without multi-processing this tends to be inefficient, and has
problems with page faults and the number of necessary pages in memory
at the same time, especially with gather/scatter accesses and very
long vector memory-memory instructions as on the NEC SX (IIRC). But
of course, the NEC SX is a supercomputer architecture, a certain
amount of architectural nonsense is not unusual there.
Given such difficulties, vector instructions, at least with gather
loads and scatter stores (whether strided or indirect), are not a good
idea (and a recent Intel hardware vulnerability shows another reason
why gather is not a good idea). Your VVM OTOH allows a clean
architectural definition.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>