Sujet : Re: arm ldxr/stxr vs cas
De : chris.m.thomasson.1 (at) *nospam* gmail.com (Chris M. Thomasson)
Groupes : comp.archDate : 08. Sep 2024, 21:28:07
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vbl1co$22022$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
User-Agent : Mozilla Thunderbird
On 9/8/2024 4:53 AM, jseigh wrote:
On 9/8/24 02:35, Chris M. Thomasson wrote:
On 9/7/2024 5:59 PM, MitchAlsup1 wrote:
On Sat, 7 Sep 2024 23:16:35 +0000, Chris M. Thomasson wrote:
>
On 9/7/2024 4:14 PM, Chris M. Thomasson wrote:
[...]
When I am using CAS I don't really expect it to fail willy nilly even if
the comparands are still the same. Weak vs Strong. Still irks me a bit.
;^)
>
There are algorithms out there, usually state machines that depend on
strong cas. When a CAS fails, it depends on it failing because the
comparands were actually different...
>
Leading to ABA failures::
>
Do you really want the following CAS to succeed ??
>
LD R19,[someMemoryValue]
..
interrupt delays program execution for 1 week
..
CAS R17,R19,[someMemoryLocation]
>
Given that the someMemoryLocation is accessible to other programs
while tis one is sleeping ??
>
Thus, it seems reasonable to fail a CAS when one cannot determine
if the memory location has been changed and changed back in the
mean time.
>
ABA, well that can happen with CAS and certain algorithms that use them. The good ol' version counter is pretty nice, but it still can fail. 64 bit words, 128 bit CAS. Loads to boot... ;^) Actually, I think Joe mentioned something interesting about CAS a long time ago wrt IBM... Candy Cane Striped books (Joe do you remember?) about a way to avoid live lock and failing in the os. I think Windows has some like it with their SList and SEH?
You can have an ABA problem with single word CAS if your values (usually
pointers) are one word in size.
Well, it depends on what you are using CAS for...? Fair enough? Is it for actual nodes of a linked list? Or version numbers for something else... A state machine that depends on strong CAS?
Wrt nodes of a linked list, ABA can kill the integrity of said list. There is a nice appendix in the IBM principles of operation... Free Pool Manipulation iirc. They explicitly mention the ABA problem. Then they mention PLO or something, some locking primitive... Remember that one? I think its kind of like my deadlock free multex locking algorithm:
https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/Ti8LFyH4CgAJOne little trick is to use indexes instead of raw node pointers (aka, words). So on a 64-bit system the anchor can be:
struct anchor
{
uint16_t node_index;
// use the rest of the bits for a 48-bit version count...
};
;^D
Remember your interesting atomic 63-bit counter you posted a long time ago, Joe?
Some algorithms that use CAS simply do not not suffer from ABA. The ABA sensitive ones are usually those that use pointers for dynamic linked data structures...
The solution to that is to use double
word CAS, DWCAS, with the other word being a monotonic sequence number. The logic being that it would take longer to wrap and reuse the sequence
number than any reasonable delay in execution. In the 70's s370
CDS was 64 bits (2 32 bit words) and it was estimated that it would
take a year to wrap a 32 bit counter.
Yup. However, it has a certain little "time frame" to wrap that counter in... So, it took _several_ specific conditions to all occur in "perfect order" in a certain "time frame" for ABA to raise its nasty head up and cause real damage... The appendix in the IBM docs mentions all of this in pretty good detail indeed.
The risc-v arch manual mentions that you need DWCAS or LL/SC to avoid
the ABA problem and that they chose LL/SC to avoid dealing with 128
bit atomic data sizes (their loss but I'm not going into that here).
Anyway if you want to avoid the ABA problem, you can't do it with
C/C++ atomics since they don't support DWCAS.
They should and there is no reason why they should not support native DWCAS _if_ the underlying architecture supports it directly. cmpxchg16 works well for 64 bit words. Perhaps the fun cmp8xchg16 instruction would be a pain in C/C++'s hind end... ;^)
GRRRR...
The candy striped Principle of Operation was an internal restricted
document with architecture and engineering notes added. The note
was that for short CAS loops fairness should be guaranteed so all
processors could make equal forward progress.
Ahhhhh... I knew I remembered you mentioning something like that many moons ago: Thanks. :^)
Speaking of live lock, in the 80's, VM/XA spin locks broke on a 4300 box. VMXA used a test and set loop without any back off. The 4300 TS
microcode timed out getting a lock and failed silently. So all the
cpu's spun forever trying to get a lock that was free.
Holy shit! Wow... Humm... Actually, remember your old rw spinlock? I am having a hard time finding it over on comp.programming.threads. I know you wrote about it and showed an implementation. It might of been a bakery algorithm.
We sort
of figured out when I provided a patch that put a load and test
of the lock so we could put in a hardware break point to see if the
lock was actually not free. It suddenly started working because
the patch provide enough backoff for the microcode locking
to start working again.
Wow! Thanks for sharing that story. The test saved the day wrt providing a bit of a backoff... ;^)