Re: DMA is obsolete

Liste des GroupesRevenir à c arch 
Sujet : Re: DMA is obsolete
De : theom+news (at) *nospam* chiark.greenend.org.uk (Theo)
Groupes : comp.arch
Date : 27. Apr 2025, 20:13:47
Autres entêtes
Organisation : University of Cambridge, England
Message-ID : <Frc*6b6aA@news.chiark.greenend.org.uk>
References : 1 2 3
User-Agent : tin/1.8.3-20070201 ("Scotasay") (UNIX) (Linux/5.10.0-28-amd64 (x86_64))
MitchAlsup1 <mitchalsup@aol.com> wrote:
On Sat, 26 Apr 2025 17:29:06 +0000, Scott Lurndal wrote:
 
However,  I expect there are still benefits in using DMA for bulk data
transfer, particularly for network packet handling where
throughput is more interesting than PCI MMIO latency.
 
I would like to add a though to the concept under discussion::
 
Does the paper's conclusion hold better or worse if/when the
core ISA contains both LDM/STM and MM instructions. LDM/STM
allow for several sequential registers to move to/from MMI/O
memory in a single interconnect transaction, while MM allows
for up-to page-sized transfers in a single instruction and
only 2 interconnect transactions.

I think this depends on the scale of your core.  For say a NIC <-> CPU,
maybe the CPU has MM instructions, but perhaps the microcontroller on the
NIC doesn't.  That means eg the CPU can push packets to transmit, but the
NIC is not setup to push packets it has received - it has to ask the CPU to
pull which will slow things down.

You can add that feature of course, but then isn't it just becoming a DMA
engine?

ie it's about control path and datapath.  A controller doesn't need a wide
datapath (it isn't doing much compute) but the data transfer does need a
wide datapath.  If you size a CPU for a wide datapath then you end up
paying costs for that (eg wide GP registers when you don't need them).

One concern that arises from the paper are the security
implications of device access to the cache coherency
protocol.   Not an issue for a well-behaved device, but
potentially problematic in a secure environment with
third-party CXL-mem devices.
 
Citation please !?!

CXL's protection model isn't very good:
https://dl.acm.org/doi/pdf/10.1145/3617580
(declaration: I'm a coauthor)

Also note:: device DMA goes through I/O MMU which adds a
modicum of security-fencing around device DMA accesses
but also adding latency.

Indeed, and page-based lookups are both slow (if you miss in the IOTLB) and
have spatial and temporal security issues.

Most modern CPU's support "allocate" hints on inbound DMA
that will automatically place the data in the right CPU cache as
quickly as possible.
>
Decomposing that packet transfer into CPU loads and stores
in a coherent fabric doesn't gain much, and burns more power
on the "device" than a DMA engine.
 
That was my initial thought--core performing lots of LD/ST to
MMI/O is bound to consume more power than device DMA.
 
Secondarily, using 1-few cores to perform PIO is not going to
have the data land in the cache of the core that will run when
the data has been transferred. The data lands in the cache doing
PIO and not in the one to receive control after I/O is done.
{{It may still be "closer than" memory--but several cache
coherence protocols take longer cache-cache than dram-cache.}}

I think this is an 'it depends'.  If you're doing RPC type operations, it
takes more work to warm up the DMA than it does to just do PIO.  If you're
an SSD pulling a large file from flash, DMA is more efficient.  If you're
moving network packets, which involve multiple scatter-gathers per packet,
then maybe some heavy lifting is useful for the address handling.

It was also the only ARM64 processor chip we built with a cache-coherent
interconnect until the recent CXL based products.
>
Overall, a very interesting paper.
 
Reminds me of trying to sell a micro x86-64 to AMD as a project.
The µ86 is a small x86-64 core made available as IP in Verilog
where it has/runs the same ISA as main GBOoO x86, but is placed
"out in the PCIe" interconnect--performing I/O services topo-
logically adjacent to the device itself. This allows 1ns access
latencies to DCRs and performing OS queueing of DPCs,... without
bothering the GBOoO cores.
 
AMD didn't buy the arguments.

Intel tried that with the Quark line of 'microcontrollers', which appeared
to be a warmed over P54 Pentium (whether it shared microarchitecture or RTL
I'm not sure).  They were too power hungry and unwieldy to be
microcontrollers - they also couldn't run Debian/x86 despite having an MMU
because they were too old for the LOCK CMPXCHG instruction Debian used (P54
didn't need to worry about concurrency, but we do now).

I think at the end of the day there isn't actually a whole lot of benefit to
running the same ISA on your I/O as on your CPU - there tends to be a fairly
hard line between 'drivers' (on the CPU) and 'firmware' (on the device), and
on the firmware side it's easier to throw in a small RISC (eg RISC-V
nowadays) than anything complicated.

Theo

Date Sujet#  Auteur
26 Apr 25 * DMA is obsolete33John Levine
26 Apr 25 +* Re: DMA is obsolete5Lars Poulsen
26 Apr 25 i+- Re: DMA is obsolete1Terje Mathisen
27 Apr 25 i`* Re: DMA is obsolete3Theo
27 Apr 25 i +- Re: DMA is obsolete1MitchAlsup1
28 Apr 25 i `- Re: DMA is obsolete1Lawrence D'Oliveiro
26 Apr 25 `* Re: DMA is obsolete27MitchAlsup1
27 Apr 25  +* Re: DMA is obsolete2Theo
27 Apr 25  i`- Re: DMA is obsolete1MitchAlsup1
1 May 25  `* Re: DMA is obsolete24Dan Cross
1 May 25   `* Re: DMA is obsolete23MitchAlsup1
2 May 25    `* Re: DMA is obsolete22Dan Cross
2 May 25     +* Re: DMA is obsolete17Anton Ertl
2 May 25     i`* Re: DMA is obsolete16Dan Cross
3 May 25     i +* Re: DMA is obsolete13Anton Ertl
3 May 25     i i+- Re: DMA is obsolete1Robert Finch
3 May 25     i i+* Re: DMA is obsolete10Dan Cross
3 May 25     i ii`* IP (was: DMA is obsolete)9Stefan Monnier
3 May 25     i ii `* Re: IP (was: DMA is obsolete)8Thomas Koenig
3 May 25     i ii  `* Re: IP (was: DMA is obsolete)7John Levine
3 May 25     i ii   `* Re: IP (was: DMA is obsolete)6Dan Cross
4 May 25     i ii    `* Re: IP5Stefan Monnier
4 May 25     i ii     `* Re: IP4Dan Cross
4 May 25     i ii      `* Re: IP3Thomas Koenig
4 May 25     i ii       +- Re: IP1Bill Findlay
4 May 25     i ii       `- Re: IP1Lawrence D'Oliveiro
4 May 25     i i`- Re: DMA is obsolete1Lawrence D'Oliveiro
4 May 25     i +- Re: DMA is obsolete1MitchAlsup1
21 May13:36     i `- Re: DMA is obsolete1Dan Cross
2 May 25     `* Re: DMA is obsolete4MitchAlsup1
3 May 25      `* Re: DMA is obsolete3Terje Mathisen
4 May 25       `* ND-10 (was Re: DMA is obsolete)2Lars Poulsen
4 May 25        `- Re: ND-10 (was Re: DMA is obsolete)1Lawrence D'Oliveiro

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal