Sujet : Re: arm ldxr/stxr vs cas
De : jseigh_es00 (at) *nospam* xemaps.com (jseigh)
Groupes : comp.archDate : 05. Sep 2024, 22:49:32
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vbd91c$g5j0$1@dont-email.me>
References : 1 2 3 4 5
User-Agent : Mozilla Thunderbird
On 9/5/24 16:34, Chris M. Thomasson wrote:
On 9/5/2024 12:46 PM, MitchAlsup1 wrote:
On Thu, 5 Sep 2024 11:33:23 +0000, jseigh wrote:
>
On 9/4/2024 5:27 PM, MitchAlsup1 wrote:
On Mon, 2 Sep 2024 17:27:57 +0000, jseigh wrote:
>
I read that arm added the cas instruction because they didn't think
ldxr/stxr would scale well. It wasn't clear to me as to why that
would be the case. I would think the memory lock mechanism would
have really low overhead vs cas having to do an interlocked load
and store. Unless maybe the memory lock size might be large
enough to cause false sharing issues. Any ideas?
>
A pipeline lock between the LD part of a CAS and the ST part of a
CAS is essentially FREE. But the same is true for LL followed by
a later SC.
>
Older machines with looser than sequential consistency memory models
and running OoO have a myriad of problems with LL - SC. This is
why My 66000 architecture switches from causal consistency to
sequential consistency when it encounters <effectively> LL and
switches bac after seeing SC.
>
No Fences necessary with causal consistency.
>
>
I'm not sure I entirely follow. I was thinking of the effects on
cache. In theory the SC could fail without having get the current
cache line exclusive or at all. CAS has to get it exclusive before
it can definitively fail.
>
A LL that takes a miss in L1 will perform a fetch with intent to modify,
so will a CAS. However, LL is allowed to silently fail if exclusive is
not returned from its fetch, deferring atomic failure to SC, while CAS
will fail when exclusive fails to return.
CAS should only fail when the comparands are not equal to each other. Well, then there is the damn weak and strong CAS in C++11... ;^o
LL-SC is designed so that
when a failure happens, failure is visible at SC not necessarily at LL.
>
There are coherence protocols that allows the 2nd party to determine
if it returns exclusive or not. The example I know is when the 2nd
party is already performing an atomic event and it is better to fail
the starting atomic event than to fail an ongoing atomic event.
In My 66000 the determination is made under the notion of priority::
the higher priority thread is allows to continue while the lower
priority thread takes the failure. The higher priority thread can
be the requestor (1st party) or the holder of data (2nd party)
while all interested observers (3rd parties) are in a position
to see what transpired and act accordingly (causal).
>
I'm not so sure about making the memory lock granularity same as
cache line size but that's an implementation decision I guess.
I do like the idea of detecting potential contention at the
start of LL/SC so you can do back off. Right now the only way I
can detect contention is after the fact when the CAS fails and
I probably have the cache line exclusive at that point. It's
pretty problematic.
Joe Seigh