cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <5a77c46910dd2100886ce6fc44c4c460@www.novabbs.org>,
MitchAlsup1 <mitchalsup@aol.com> wrote:
Other than it being placed "away" from the centralized cores,
it runs the same ISA as the main cores has longer latency to
coherent memory and shorter latency to device control registers
--which is why it is placed close to the device itself:: latency.
The big fast centralized core is going to get microsecond latency
from MMI/O device whereas ASIC version will have handful of nano-
second latencies. So the 5 GHZ core sees ~1 microsecond while the
little ASIC sees 10 nanoseconds. ...
>
Yes, I get the argument for WHY you'd do it, I just want to make
sure that it's an ordinary core (albeit one that is far away
from the sockets with the main SoC complexes) that I interact
with in the usual manner.
Intel has put 2 Crestmont cores (their then-current E-core, not at all
tiny) on the SoC tile (not the compute tile) of Meteor Lake. The main
idea there seems to be to save power (Meteor Lake is a laptop CPU)
when doing low-load things like playing videos by keeping the compute
tile powered down.
core very close to the device could handle that swimmingly,
though I'm not sure it would be enough to do it at (say) line
rate for a 400Gbps NIC or Gen5 NVMe device.
>
I suspect the 400 GHz NIC needs a rather BIG core to handle the
traffic loads.
Looking at
https://chipsandcheese.com/p/arms-cortex-a53-tiny-but-important, a
Cortex-A53 would not be up to it (at 1896MHz it can read <12GB/s and
write <18GB/s even to the L1 cache). However, Chester Lam notes: "A53
offers very low cache bandwidth compared to pretty much any other core
we’ve analyzed." I think, though, that a small in-order core like the
A53, but with enough load and store buffering and enough bandwidth to
I/O and the memory controller should not have a problem shoveling data
from or to a 400Gb/s NIC. With 128 bits/cycle in each direction one
would need one transfer per cycle in each direction at 3125MHz to
achieve 400Gb/s, or maybe 4GHz for a dual-issue core to allow for loop
overhead. Given that the A53 typically only has 2GHz, supporting 256
bits/cycle of transfer width (for load and store instructions, i.e.,
along the lines of AVX-256) would be more appropriate.
Going for an OoO core (something like AMD's Bobcat or Intel's
Silvermont) would help achieve the bandwidth goals without excessive
fine-tuning of the software.
Having the remote core run the same OS code as every other core
means the OS developers have fewer hoops to jump through. Bug-for
bug compatibility means that clearing of those CRs just leaves
the core out in the periphery idling and bothering no one.
>
Eh...Having to jump through hoops here matters less to me for
this kind of use case than if I'm trying to use those cores for
general-purpose compute.
I think it's the same thing as Greenspun's tenth rule: First you find
that a classical DMA engine is too limiting, then you find that an A53
is too limiting, and eventually you find that it would be practical to
run the ISA of the main cores. In particular, it allows you to use
the toolchain of the main cores for developing them, and you can also
use the facilities of the main cores (e.g., debugging features that
may be absent of the I/O cores) during development.
Having a separate ISA means I cannot
accidentally run a program meant only for the big cores on the
IO service processors.
Marking the binaries that should be able to run on the IO service
processors with some flag, and letting the component of the OS that
assigns processes to cores heed this flag is not rocket science. You
probably also don't want to run programs for the I/O processors on the
main cores; whether you use a separate flag for indicating that, or
whether one flag indicates both is an interesting question.
On the other hand, you buy a motherboard with said ASIC core,
and you can boot the MB without putting a big chip in the
socket--but you may have to deal with scant DRAM since the
big centralized chip contains teh memory controller.
>
A neat hack for bragging rights, but not terribly practical?
Very practical for updating the firmware of the board to support the
big chip you want to put in the socket (called "BIOS FlashBack" in
connection with AMD big chips). In a case where we did not have that
feature, and the board did not support the CPU, we had to buy another
CPU to update the firmware
<
https://www.complang.tuwien.ac.at/anton/asus-p10s-c4l.html>. That's
especially relevant for AM4 boards, because the support chips make it
hard to use more than 16MB Flash for firmware, but the firmware for
all supported big chips does not fit into 16MB. However, as the case
mentioned above shows, it's also relevant for Intel boards.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>