Sujet : Re: arm ldxr/stxr vs cas
De : chris.m.thomasson.1 (at) *nospam* gmail.com (Chris M. Thomasson)
Groupes : comp.archDate : 11. Sep 2024, 08:00:51
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vbrf73$3fb6u$2@dont-email.me>
References : 1 2 3 4 5 6 7 8
User-Agent : Mozilla Thunderbird
On 9/10/2024 9:15 PM, Paul A. Clayton wrote:
On 9/9/24 3:14 AM, Terje Mathisen wrote:
jseigh wrote:
>
I'm not so sure about making the memory lock granularity same as
cache line size but that's an implementation decision I guess.
>
Just make sure you never have multiple locks residing inside the same cache line!
Never?
I suspect at least theoretically conditions could exist where
having more than one lock within a cache line would be beneficial.
If lock B is always acquired after lock A, then sharing a cache
line might (I think) improve performance. One would lose
prefetched capacity for the data protected by lock A and lock B.
This assumes simple locks (e.g., not readers-writer locks).
It seems to me that the pingpong problem may be less important
than spatial locality depending on the contention for the cache
line and the cache hierarchy locality of the contention
(pingponging from a shared level of cache would be less
expensive).
If work behind highly active locks is preferentially or forcefully
localized, pingponging would be less of a problem, it seems.
Instead of an arbitrary core acquiring a lock's cache line and
doing some work, the core could send a message to the natural owner of the cache line to do the work.
If communication between cores was low latency and simple messages
used little bandwidth, one might also conceive of having a lock
manager that tracks the lock state and sends a granted or not-
granted message back. This assumes that the memory location of the
lock itself is separate from the data guarded by the lock.
Being able to grab a snapshot of some data briefly without
requiring (longer-term) ownership change might be useful even
beyond lock probing (where a conventional MESI would change the
M-state cache to S forcing a request for ownership when the lock
is released). I recall some paper proposed expiring cache line
ownership to reduce coherence overhead.
Within a multiple-cache-line atomic operation/memory transaction,
I _think_ if the write set is owned, the read set could be grabbed
as such snapshots. I.e., I think any remote write to the read set
could be "after" the atomic/transaction commits. (Such might be
too difficult to get right while still providing any benefit.)
(Weird side-thought: I wonder if a conservative filter might be
useful for locking, particularly for writer locks. On the one
hand, such would increase the pingpong in the filter when writer
locks are set/cleared; on the other hand, reader locks could use
a remote increment within the filter check atomic to avoid slight
cache pollution.)
Generally one wants the mutex state to be completely isolated. Padded up to at least a L2 cache line, or if using LL/SC perhaps even a reservation granule... Not only properly padded, but correctly aligned on a L2 cache line or a reservation granule boundary. This helps prevent false sharing and makes life a little better for the underlying architecture...
You also don't want mutex traffic to interfere with the critical section, or locked region if you will...