Sujet : Re: arm ldxr/stxr vs cas
De : terje.mathisen (at) *nospam* tmsw.no (Terje Mathisen)
Groupes : comp.archDate : 10. Sep 2024, 10:22:05
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vbp33u$2sr79$1@dont-email.me>
References : 1 2 3 4 5 6 7 8
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.2
jseigh wrote:
On 9/9/24 03:14, Terje Mathisen wrote:
jseigh wrote:
>
I'm not so sure about making the memory lock granularity same as
cache line size but that's an implementation decision I guess.
>
Just make sure you never have multiple locks residing inside the same cache line!
This the the terminology ARM uses when describing their LL/SC
implementation. It is not the best choice in terminology.
>
>
I do like the idea of detecting potential contention at the
start of LL/SC so you can do back off. Right now the only way I
can detect contention is after the fact when the CAS fails and
I probably have the cache line exclusive at that point. It's
pretty problematic.
>
I do prefer LOCK XADD instead of CAS (CmpXchg*), because the return value will also tell you which queue entry to pick/work on.
>
It will not be optimal when really contended, but at least one participant will make forward progress, and typically several of them.
I'm not aware of any lock-free queue algorithms that use
atomic_fetch_add that are actually lock-free, error free,
and/or don't have an ABA problem. I'm not saying there
aren't, just that I'm not aware of them.
I'm not sure either: I have written test code that should be general purpose but never actually used that in any production systems.
The one time I used this for critical work (a maximum throughput ntpd server) I had a single writer and N-1 readers, so I actually decided to create one queue per reader (cpu core), making them much easier to get right. :-)
Each queue was a power of two in length, the write and read indices were full 32-bit unsigned variables that I masked before accessing the work list, so I never needed to worry about queue end wraparound.
The writer simply allocated work to each queue in round robin fashion.
Terje
-- - <Terje.Mathisen at tmsw.no>"almost all programming can be viewed as an exercise in caching"