Sujet : Re: arm ldxr/stxr vs cas
De : chris.m.thomasson.1 (at) *nospam* gmail.com (Chris M. Thomasson)
Groupes : comp.archDate : 10. Sep 2024, 19:52:39
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vbq4hn$33j93$5@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10
User-Agent : Mozilla Thunderbird
On 9/10/2024 7:51 AM, jseigh wrote:
On 9/10/24 05:22, Terje Mathisen wrote:
jseigh wrote:
On 9/9/24 03:14, Terje Mathisen wrote:
jseigh wrote:
>
I'm not so sure about making the memory lock granularity same as
cache line size but that's an implementation decision I guess.
>
Just make sure you never have multiple locks residing inside the same cache line!
>
This the the terminology ARM uses when describing their LL/SC
implementation. It is not the best choice in terminology.
>
>
I do like the idea of detecting potential contention at the
start of LL/SC so you can do back off. Right now the only way I
can detect contention is after the fact when the CAS fails and
I probably have the cache line exclusive at that point. It's
pretty problematic.
>
I do prefer LOCK XADD instead of CAS (CmpXchg*), because the return value will also tell you which queue entry to pick/work on.
>
It will not be optimal when really contended, but at least one participant will make forward progress, and typically several of them.
>
>
I'm not aware of any lock-free queue algorithms that use
atomic_fetch_add that are actually lock-free, error free,
and/or don't have an ABA problem. I'm not saying there
aren't, just that I'm not aware of them.
>
I'm not sure either: I have written test code that should be general purpose but never actually used that in any production systems.
>
The one time I used this for critical work (a maximum throughput ntpd server) I had a single writer and N-1 readers, so I actually decided to create one queue per reader (cpu core), making them much easier to get right. :-)
>
Each queue was a power of two in length, the write and read indices were full 32-bit unsigned variables that I masked before accessing the work list, so I never needed to worry about queue end wraparound.
>
The writer simply allocated work to each queue in round robin fashion.
>
You should look at the RSEQ (restartable sequences) code for SPMC
queue. It's in assembler (basically because RSEQ needs to know
the exact address of the commit instruction), but essentially an
enqueue operation is
tail->next = &added_node;
tail = &added_node; // the commit instruction
If you get interrupted before the tail gets updated, no matter
because the next successful enqueue will work fine. You
don't even have to null the next pointer of the node you
are enqueuing.
The only downside is that if you are on a thousand processor
box, you will have 1000 SPMC queues.
Check this one out:
https://groups.google.com/g/comp.lang.c++/c/Skv1PoQsUZo/m/XI3Qw64xAAAJ:^)