Sujet : Re: MSI interrupts
De : cross (at) *nospam* spitfire.i.gajendra.net (Dan Cross)
Groupes : comp.archDate : 25. Mar 2025, 16:08:18
Autres entêtes
Organisation : PANIX Public Access Internet and UNIX, NYC
Message-ID : <vrugt2$lqf$1@reader1.panix.com>
References : 1 2 3 4
User-Agent : trn 4.0-test77 (Sep 1, 2010)
In article <
8qyEP.1257951$FVcd.630606@fx10.iad>,
Scott Lurndal <
slp53@pacbell.net> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <684d32898ec85b91bcb9dcdb97d8065a@www.novabbs.org>,
MitchAlsup1 <mitchalsup@aol.com> wrote:
>
My ATOMIC stuff allows for the construction, in SW, of pretty
much any ATOMIC primitive anyone would like. It is, in general,
not suited to guard critical regions, but to grab the lock which
does.
>
Ok, that's good to understand.
>
Most uses of My ATOMIC stuff is to grab/change/update data
that then guards various critical regions. More like a multi-
valued version of LDL-STC.
>
I don't see how constructing the primitive that way is terribly
useful compared to just issuing multiple load-locked/store-cond
instructions; in particular, it doesn't seem like the sort of
thing that composes well: every sequence that uses it must be
coded to use it, explicitly, instead of just nesting calls to
"lock".
>
LL/SC has been shown to be less efficient for contended
locks than atomic operations when the core count exceeds
single digits.
>
It's difficult to argue against the standard set of atomic
primitives (the PCIe set: compare and swap, load and add,
swap) for scalability; ARM has extended the list to
include exclusive or, set-bit, clear-bit, min, max, et al.
>
Analysis comparing with LL/SC:
>
https://cpufun.substack.com/p/atomics-in-aarch64
That analysis is interesting; thanks for pointing it out. I'm
not sure that it is directly applicable to locks, though, since
it's primarily concerned with atomic increment. Semaphores,
perhaps; something like `ldaddal` would work well in the
implementation of `xadd`, as in this paper by Mullender and Cox:
https://swtch.com/semaphore.pdfHighly contended spin locks based on CAS loops tend not to scale
well as core counts increase, hence more elaborate lock types
like MCS locks, CLH locks, and so on. Boyd-Wickizer et al go
into this in detail in
https://people.csail.mit.edu/nickolai/papers/boyd-wickizer-locks.pdfI rather doubt that single-instruction atomics versus LL-SC are
a going to make a big difference here. But I also wonder at the
internal implementation of these instructions in e.g. that
Fujitsu chip that allows throughput to stay more or less
constant as the number of cores increases; I'd think that cache
coherency traffic as threads contend for a single cache line
would create a bottleneck leading to (at least) a linear
slow down. This suggests to me that perhaps the performance
characteristcs are less in the specific instructions used (one
versus LL-SC), but rather how those instructions are
implemented with respect to the coherency protocol.
- Dan C.