Newsportal USENET - Re: DMA is obsolete

On 2025-05-03 2:11 a.m., Anton Ertl wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <2025May2.073450@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
I think it's the same thing as Greenspun's tenth rule: First you find
that a classical DMA engine is too limiting, then you find that an A53
is too limiting, and eventually you find that it would be practical to
run the ISA of the main cores. In particular, it allows you to use
the toolchain of the main cores for developing them,
>
These are issues solveable with the software architecture and
build system for the host OS.
Certainly, one can work around many bad decisions, and in reality one
has to work around some bad decisions, but the issue here is not
whether "the issues are solvable", but which decision leads to better
or worse consequences.

The important characteristic is
that the software coupling makes architectural sense, and that
simply does not require using the same ISA across IPs.
IP? Internet Protocol? Software Coupling sounds to me like a concept
from Constantine out of my Software engineering class. I guess you
did not mean either, but it's unclear what you mean.
In any case, I have made arguments why it would make sense to use the
same ISA as for the OS for programming the cores that replace DMA
engines. I will discuss your counterarguments below, but the most
important one to me seems to be that these cores would cost more than
with a different ISA. There is something to that, but when the
application ISA is cheap to implement (e.g., RV64GC), that cost is
small; it may be more an argument for also selecting the
cheap-to-implement ISA for the OS/application cores.

Indeed, consider AMD's Zen CPUs; the PSP/ASP/whatever it's
called these days is an ARM core while the big CPUs are x86.
I'm pretty sure there's an Xtensa DSP in there to do DRAM and
timing and PCIe link training.
The PSPs are not programmable by the OS or application programmers, so
using the same ISA would not benefit the OS or application
programmers. By contrast, the idea for the DMA replacement engines is
that they are programmable by the OS and maybe the application
programmers, and that changes whether the same ISA is beneficial.
What is "ASP/whatever"?

Similarly with the ME on Intel.
Last I read about it, ME uses a core developed by Intel with IA-32 or
AMD64; but in any case, the ME is not programmable by OS or
application programmers, either.

A BMC might be running on whatever.
Again, a BMC is not programmable by OS or application programmers.

We increasingly see ARM
based SBCs that have small RISC-V microcontroller-class cores
embedded in the SoC for exactly this sort of thing.
That's interesting; it points to RISC-V being cheaper to implement
than ARM. As for "that sort of thing", they are all not programmable
by OS or application programmers, so see above.

Our hardware RoT
?

The problem is when such service cores are hidden (as they are
in the case of the PSP, SMU, MPIO, and similar components, to
use AMD as the example) and treated like black boxes by
software. It's really cool that I can configure the IO crossbar
in useful way tailored to specific configurations, but it's much
less cool that I have to do what amounts to an RPC over the SMN
to some totally undocumented entity somewhere in the SoC to do
it. Bluntly, as an OS person, I do not want random bits of code
running anywhere on my machine that I am not at least aware of
(yes, this includes firmware blobs on devices).
Well, one goes with the other. If you design the hardware for being
programmed by the OS programmers, you use the same ISA for all the
cores that the OS programmers program, whereas if you design the
hardware as programmed by "firmware" programmers, you use a
cheap-to-implement ISA and design the whole thing such that it is
opaque to OS programmers and only offers some certain capabilities to
OS programmers.
And that's not just limited to ISAs. A very successful example is the
way that flash memory is usually exposed to OSs: as a block device
like a plain old hard disk, and all the idiosyncracies of flash are
hidden in the device behind a flash translation layer that is
implemented by a microcontroller on the device.
What's "SMN"?

and you can also
use the facilities of the main cores (e.g., debugging features that
may be absent of the I/O cores) during development.
>
This is interesting, but we've found it more useful going the
other way around. We do most of our debugging via the SP.
Since The SP is also responsible for system initialization and
holding x86 in reset until we're reading for it to start
running, it's the obvious nexus for debugging the system
holistically.
Sure, for debugging on the core-dump level that's useful. I was
thinking about watchpoint and breakpoint registers and performance
counters that one may not want to implement on the DMA-replacement
core, but that is implemented on the OS/application cores.

Marking the binaries that should be able to run on the IO service
processors with some flag, and letting the component of the OS that
assigns processes to cores heed this flag is not rocket science.
>
I agree, that's easy. And yet, mistakes will be made, and there
will be tension between wanting to dedicate those CPUs to IO
services and wanting to use them for GP programs: I can easily
imagine a paper where someone modifies a scheduler to move IO
bound programs to those cores. Using a different ISA obviates
most of that, and provides an (admittedly modest) security benefit.
If there really is such tension, that indicates that such cores would
be useful for general-purpose use. That makes the case for using the
same ISA even stronger.
As for "mistakes will be made", that also goes the other way: With a
separate toolchain for the DMA-replacement ISA, there is lots of
opportunity for mistakes.
As for "security benefit", where is that supposed to come from? What
attack scenario do you have in mind where that "security benefit"
could materialize?

And if I already have to modify or configure the OS to
accommodate the existence of these things in the first place,
then accommodating an ISA difference really isn't that much
extra work. The critical observation is that a typical SMP view
of the world no longer makes sense for the system architecture,
and trying to shoehorn that model onto the hardware reality is
just going to cause frustration.
The shared-memory multiprocessing view of the world is very
successful, while distributed-memory computers are limited to
supercomputing and other areas where hardware cost still dominates
over software cost (i.e., where the software crisis has not happened
yet); as an example of the lack of success of the distributed-memory
paradigm, take the PlayStation 3; programmers found it too hard to
work with, so they did not use the hardware well, and eventually Sony
decided to go for an SMP machine for the PlayStation 4 and 5.
OTOH, one can say that the way many peripherals work on
general-purpose computers is more along the lines of
distributed-memory; but that's probably due to the relative hardware
and software costs for that peripheral. Sure, the performance
characteristics are non-uniform (NUMA) in many cases, but 1) caches
tend to smooth over that, and 2) most of the code is not
performance-critical, so it just needs to run, which is easier to
achieve with SMP and harder with distributed memory.
Sure, people have argued for advantages of other models for decades,
like you do now, but SMP has usually won.

On the other hand, you buy a motherboard with said ASIC core,
and you can boot the MB without putting a big chip in the
socket--but you may have to deal with scant DRAM since the
big centralized chip contains teh memory controller.
>
A neat hack for bragging rights, but not terribly practical?
>
Very practical for updating the firmware of the board to support the
big chip you want to put in the socket (called "BIOS FlashBack" in
connection with AMD big chips).
>
"BIOS", as loaded from the EFS by the ABL on the PSP on EPYC
class chips, is usually stored in a QSPI flash on the main
board (though starting with Turin you _can_ boot via eSPI).
Strictly speaking, you don't _need_ an x86 core to rewrite that.
On our machines, we do that from the SP, but we don't use AGESA
or UEFI: all of the platform enablement stuff done in PEI and
DXE we do directly in the host OS.
EFS? ABL? QSPI? eSPI? PEI? DXE?
Anyway, what you do in your special setup does not detract from the
fact that being able to flash the firmware without having a working
main core has turned out to be so useful that out of 218 AM5
motherboards offered in Austria <https://geizhals.at/?cat=mbam5>, 203
have that feature.

Also, on AMD machines, again considering EPYC, it's up to system
software running on x86 to direct either the SMU or MPIO to
configure DXIO and the rest of the fabric before PCIe link
training even begins (releasing PCIe from PERST is done by
either the SMU or MPIO, depending on the specific
microarchitecture). Where are these cores, again? If they're
close to the devices, are they in the root complex or on the far
side of a bridge? Can they even talk to the rest of the board?
The core that does the flashing obviously is on the board, not on the
CPU package (which may be absent). I do not know where on the board
it is. Typically only one USB port can be used for that, so that may
indicate that a special path may be used for that without initializing
all the USB ports and the other hardware that's necessary for that; I
think that some USB ports are directly connected to the CPU package,
so those would not work anyway.

In a case where we did not have that
feature, and the board did not support the CPU, we had to buy another
CPU to update the firmware
<https://www.complang.tuwien.ac.at/anton/asus-p10s-c4l.html>. That's
especially relevant for AM4 boards, because the support chips make it
hard to use more than 16MB Flash for firmware, but the firmware for
all supported big chips does not fit into 16MB. However, as the case
mentioned above shows, it's also relevant for Intel boards.
>
You shouldn't need to boot the host operating system to do that,
though I get on most consumer-grade machines you'll do it via
something that interfaces with AGESA or UEFI.
In the bad old days you had to boot into DOS and run a DOS program for
flashing the BIOS. Or worse, Windows; not very useful if you don't
have Windows installed on the computer (DOS at least could be booted
from a floppy disk). My last few experiences in that direction were
firmware flashing as a "BIOS" feature, and the flashback feature
(which has it's own problems, because communication with the user is
limited).

Most server-grade
machines will have a BMC that can do this independently of the
main CPU,
And just in another posting you wrote "but not terribly practical?".
The board I mentioned above where we had to buy a separate CPU for
flashing mentioned a BMC on the feature list, but when we looked in
the manual, we found that the BMC is not delivered with the board, but
has to be bought separately. There was also no mention that one can
use the BMC for flashing the BIOS.

and I should be clear that I'm discounting use cases
for consumer grade boards, where I suspect something like this
is less interesting than on server hardware.
What makes you think so? And what do you mean with "something like
this"?
1) "BIOS flashback" is a mostly-standard feature in AM5 (i.e.,
consumer-grade) boards.
2) DMA has been a standard feature in various forms on consumer
hardware since the first IBM PC in 1981, and replacing the DMA engines
with cores running a general-purpose ISA accessible to OS designers
will not be limited to servers; if hardware designers and OS
developers put development time into that, there is no reason for
limiting that effort to servers. The existence of the LPE-Cores on
Meteor Lake (not a server chip) and the in-order ARM cores on various
smartphone SOCs, the existence of P-Cores and E-Cores on Intel
consumer-grade CPUs, while the server versions of these CPUs have the
E-Cores disabled, and the uniformity of cores on the dedicated server
CPUs indicates that non-uniform cores seem to be hard to sell in
server space.
- anton

My gut tells me that it would be better to have a “flat” design with all processors of the same type. It would likely save a lot of debugging headaches. But this is from the perspective of a single developer. I think it may not be true however, that there would be more debugging headaches if the control CPUs were different than the main CPU. The “peripheral processors” would likely be cut down versions of the main CPU and have their own idiosyncratic bugs. They end up being a bit different anyway. How are bugs rated? I am thinking bugs per LOC regardless of CPU used. Sure there is a learning curve for a different processor, but that curve is likely short for an experienced person or long for a newbie.
I have been pondering how to add test facilities to my own CPU core and thinking of using a small co-processor. Possibly a stack machine or something like the OPC challenge processor.

Date	Sujet	#	Auteur
26 Apr 25	DMA is obsolete	33	John Levine
26 Apr 25	Re: DMA is obsolete	5	Lars Poulsen
26 Apr 25	Re: DMA is obsolete	1	Terje Mathisen
27 Apr 25	Re: DMA is obsolete	3	Theo
27 Apr 25	Re: DMA is obsolete	1	MitchAlsup1
28 Apr 25	Re: DMA is obsolete	1	Lawrence D'Oliveiro
26 Apr 25	Re: DMA is obsolete	27	MitchAlsup1
27 Apr 25	Re: DMA is obsolete	2	Theo
27 Apr 25	Re: DMA is obsolete	1	MitchAlsup1
1 May 25	Re: DMA is obsolete	24	Dan Cross
1 May 25	Re: DMA is obsolete	23	MitchAlsup1
2 May 25	Re: DMA is obsolete	22	Dan Cross
2 May 25	Re: DMA is obsolete	17	Anton Ertl
2 May 25	Re: DMA is obsolete	16	Dan Cross
3 May 25	Re: DMA is obsolete	13	Anton Ertl
3 May 25	Re: DMA is obsolete	1	Robert Finch
3 May 25	Re: DMA is obsolete	10	Dan Cross
3 May 25	IP (was: DMA is obsolete)	9	Stefan Monnier
3 May 25	Re: IP (was: DMA is obsolete)	8	Thomas Koenig
3 May 25	Re: IP (was: DMA is obsolete)	7	John Levine
3 May 25	Re: IP (was: DMA is obsolete)	6	Dan Cross
4 May 25	Re: IP	5	Stefan Monnier
4 May 25	Re: IP	4	Dan Cross
4 May 25	Re: IP	3	Thomas Koenig
4 May 25	Re: IP	1	Bill Findlay
4 May 25	Re: IP	1	Lawrence D'Oliveiro
4 May 25	Re: DMA is obsolete	1	Lawrence D'Oliveiro
4 May 25	Re: DMA is obsolete	1	MitchAlsup1
21 May 25	Re: DMA is obsolete	1	Dan Cross
2 May 25	Re: DMA is obsolete	4	MitchAlsup1
3 May 25	Re: DMA is obsolete	3	Terje Mathisen
4 May 25	ND-10 (was Re: DMA is obsolete)	2	Lars Poulsen
4 May 25	Re: ND-10 (was Re: DMA is obsolete)	1	Lawrence D'Oliveiro