Sujet : Re: DMA is obsolete
De : terje.mathisen (at) *nospam* tmsw.no (Terje Mathisen)
Groupes : comp.archDate : 26. Apr 2025, 18:28:21
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vuj53m$2s0jv$1@dont-email.me>
References : 1 2
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.20
Lars Poulsen wrote:
On 2025-04-26, John Levine <johnl@taugh.com> wrote:
Well, not entirely. This preprint argues that in environments with
lots of cores and where latency is an issue, programmed I/O can outperform
DMA.
>
Rethinking Programmed I/O for Fast Devices, Cheap Cores, and Coherent Interconnects
>
Anastasiia Ruzhanskaia, Pengcheng Xu, David Cock, Timothy Roscoe
[snip]
>
https://arxiv.org/abs/2409.08141
What is the difference between DMA and message-passing to another core
doing CMOV loop at the ISA level?
DMA means doing that it the micro-engine instead of at the ISA level.
Same difference.
What am I missing?
I think, in the end it all comes down to power:
If the DMA engine can move n GB of data using less total power than having a regular core do it with programmed IO, then the DMA engine wins.
OTOH, I have argued here in c.arch that for most data input streams, a regular core is going to look at the data eventually, and in that case the same core can do the work and either process it directly (in register file sized or smaller blocks)or work as a prefetcher to first load up $L1-sized blocks and then process that chunk.
On the gripping hand, if this is either going out, or you only need to look at a small percentage of the incoming cache lines worth of data, then the more power-efficient DMA engine can still win.
Terje
-- - <Terje.Mathisen at tmsw.no>"almost all programming can be viewed as an exercise in caching"