Sujet : Re: arm ldxr/stxr vs cas
De : jseigh_es00 (at) *nospam* xemaps.com (jseigh)
Groupes : comp.archDate : 10. Sep 2024, 15:40:05
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vbplo5$30k42$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9
User-Agent : Mozilla Thunderbird
On 9/9/24 17:09, Chris M. Thomasson wrote:
On 9/9/2024 7:29 AM, jseigh wrote:
On 9/9/24 03:14, Terje Mathisen wrote:
jseigh wrote:
>
I'm not so sure about making the memory lock granularity same as
cache line size but that's an implementation decision I guess.
>
Just make sure you never have multiple locks residing inside the same cache line!
>
This the the terminology ARM uses when describing their LL/SC
implementation. It is not the best choice in terminology.
>
>
I do like the idea of detecting potential contention at the
start of LL/SC so you can do back off. Right now the only way I
can detect contention is after the fact when the CAS fails and
I probably have the cache line exclusive at that point. It's
pretty problematic.
>
I do prefer LOCK XADD instead of CAS (CmpXchg*), because the return value will also tell you which queue entry to pick/work on.
>
It will not be optimal when really contended, but at least one participant will make forward progress, and typically several of them.
>
>
I'm not aware of any lock-free queue algorithms that use
atomic_fetch_add that are actually lock-free, error free,
and/or don't have an ABA problem. I'm not saying there
aren't, just that I'm not aware of them.
Here is an interesting one I did. A tweak from another algorithm. Basically a bakery algorithm:
<pseudo code, membars aside for a moment>
______________________________________________
struct cell { uint32_t ver; double state; };
uint32_t head = 0;
uint32_t tail = 0;
cell cells[N]; // N must be a power of 2
void init() {
for (uint32_t i = 0; i < N; ++i) cells[i].ver = i;
}
void producer(double state) {
uint32_t ver = XADD(&head, 1);
cell& c = cells[ver & (N - 1)];
while (LOAD(&c.ver) != ver) backoff();
c.state = state;
STORE(&c.ver, ver + 1);
}
double consumer() {
uint32_t ver = XADD(&tail, 1);
cell& c = cells[ver & (N - 1)];
while (LOAD(&c.ver) != ver + 1) backoff();
double state = c.state;
STORE(&c.ver, ver + N);
return state;
}
______________________________________________
One of the things I have in my implementation is that an enqueue
operation can detect if wrap occurred if it got a interrupted by
a time slice end and log how far behind it got. I'm seeing about
200,000 enqueue operations in those cases. That would have been
a huge performance hit if my queue wasn't lock-free.
Joe Seigh