Liste des Groupes | Revenir à c arch |
Chris M. Thomasson wrote:Interesting! I wrote about so-called "tagged" memory order a while back on this group. Just shooting the breeze, so to speak. Having some fun.On 12/19/2024 10:33 AM, MitchAlsup1 wrote:I had an idea a few weeks back of a different way to do membarsOn Thu, 5 Dec 2024 7:44:19 +0000, Chris M. Thomasson wrote:>
>On 12/4/2024 8:13 AM, jseigh wrote:>On 12/3/24 18:37, Stefan Monnier wrote:>>>If there are places>
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
The problem is HOW to TELL the COMPILER that these memory references
are "more special" than normal--when languages give few mechanisms.
We could start with something like
>
critical_region {
...
}
>
such that the compiler must refrain from any code motion within
those sections but is free to move things outside of those sections as
if
execution was singlethreaded.
>
C/C++11 already defines what lock acquire/release semantics are.
Roughly you can move stuff outside of a critical section into it
but not vice versa.
>
Java uses synchronized blocks to denote the critical section.
C++ (the society for using RAII for everything) has scoped_lock
if you want to use RAII for your critical section. It's not
always obvious what the actual critical section is. I usually
use it inside its own bracket section to make it more obvious.
{ std::scoped_lock m(mutex);
// .. critical section
}
>
I'm not a big fan of c/c++ using acquire and release memory order
directives on everything since apart from a few situations it's
not intuitively obvious what they do in all cases. You can
look a compiler assembler output but you have to be real careful
generalizing from what you see.
The release on the unlock can allow some following stores and things to
sort of "bubble up before it?
>
Acquire and release confines things to the "critical section", the
release can allow for some following things to go above it, so to speak.
This is making me think of Alex over on c.p.t. !
This sounds dangerous if the thing allowed to go above it is unCacheable
while the lock:release is cacheable, the cacheable lock can arrive at
another core before the unCacheable store arrives at its destination.
Humm... Need to ponder on that. Wrt the sparc:
>
membar #LoadStore | #StoreStore
>
can allow following stores to bubble up before it. If we want to block that then we would use a #StoreLoad. However, a #StoreLoad is not required for unlocking a mutex.
that should be more flexible and controllable (if that's a good thing)
so I thought I'd toss it out there for comments.
This hypothetical ISA has normal LD and ST instructions, to which I
would add a LW Load for Write instruction to optimize moving shared lines
between caches. There are also the Atomic Fetch and OP instructions
AFADD, AFAND, AFOR, AFXOR, plus ASWAP and ACAS, LL Load Locked and
SC Store Conditional, for various size of naturally aligned data,
and with various address modes.
Here is the new part:
To the above instructions is added a 3-bit Coherence Group (CG) field.
This allows one to specify different groups that various above data
accesses belong to.
The ISA has a membar instruction: MBG Memory Barrier for Group
MBG has three fields:
- one 4-bit field where each bit enables which operations this barrier
applies to, in older-younger order: Load-Load, Load-Store, Store-Load,
and Store-Store.
- two 8-bit fields where each bit selects which sets of Coherence Group(s)
this barrier applies to, one field for the older (before the membar) sets,
one for the younger (after the membar) sets.
Also the Load Store Queue is assumed to be self coherent - that loads
and stores to the same address by a single core are performed in order,
and that nothing can bypass a load or store with an unresolved address.
The CG numbers are assigned by convention, probably by the OS designers
when they define the ABI for this ISA.
Here I assigned CG:0 to be thread normal access, CG:1 to be atomic items,
CG:2 to be shared memory sections. The remaining 5 CG's can be used to
indicate different shared memory sections if their locks can overlap.
Eg. An MBG with op bits for Load-Load and Load-Store, with a before CG of 1
and after CG's 3 and 4 would block all younger loads and stores in groups
3 and 4 from starting execution until all older loads in group 1 completed.
Loads and stores in all other groups are free to reorder, within the
LSQ self coherence rules.
An MBG with all op bits and all CG bits set is a full membar.
Also if one is say juggling multiple shared sections with multiple
spinlocks or mutexes, then one can use multiple membars applied to
different groups to achieve specific bypassing blocking effects.
An MBG instruction completes and retires when no older groups of
selected loads or stores are incomplete.
Les messages affichés proviennent d'usenet.