Sujet : Re: MSI interrupts
De : terje.mathisen (at) *nospam* tmsw.no (Terje Mathisen)
Groupes : comp.archDate : 27. Mar 2025, 10:55:08
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vs379s$3vr1j$1@dont-email.me>
References : 1 2 3 4 5 6
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.20
Stefan Monnier wrote:
a going to make a big difference here. But I also wonder at the
internal implementation of these instructions in e.g. that
Fujitsu chip that allows throughput to stay more or less
constant as the number of cores increases; I'd think that cache
coherency traffic as threads contend for a single cache line
would create a bottleneck leading to (at least) a linear
slow down.
IIUC the idea is that the atomic-add is sent to the memory and performed
there, instead of moving the data to the cache. So there's no
contention for the cache line because the data is not in a cache at all.
This one I have never understood the use/need for, except to provide global shared counters of some sort:
In that case it would make perfect sense to allow any core to send (and immediately forget) a message "increment counter X by 1", but at least in my own coding, any LOCK XADD has been performed because I now need to do something with either the previous or the updated value of the counter.
Having the atomic-add performed by a memory controller, returning the result, would make it almost perfectly scalable, but would also limit the speed of this operation to the bus transaction speed, even when there is zero contention, right?
I guess if the atomic-add has been implemented at all/most cache levels as well, then it would indeed perform better for all workloads, the problem is that under contention you still have to flush the variable out to the RAM array, right?
Terje
-- - <Terje.Mathisen at tmsw.no>"almost all programming can be viewed as an exercise in caching"