Liste des Groupes | Revenir à cl c |
David Brown <david.brown@hesbynett.no> wrote:(I'm snipping bits, because these posts are getting a bit long!)On 10/09/2024 01:58, Waldek Hebisch wrote:David Brown <david.brown@hesbynett.no> wrote:On 09/09/2024 16:36, Waldek Hebisch wrote:
That is all true. But in the case of ARM Cortex-M microcontrollers, the cpu cache is part of the "black box" delivered by ARM. Manufacturers get some choices when they order the box, including some influence over the cache sizes, but it is very much integrated in the core complex (along with the NVIC and a number of other parts). It is completely irrelevant that on a Pentium PC, the cache was a separate chip, or that on a PowerPC microcontroller the interrupt controller is made by the microcontroller manufacturer and not by the cpu core designers. On microcontrollers built around ARM Cortex-M cores, ARM provides the cpu core, cpu caches (depending on the core model and options chosen), the NVIC interrupt controller, MPU, and a few other bits and pieces. The caches are called "cpu caches" - "cpu data cache" and "cpu instruction cache" because they are attached to the cpu. The microcontroller manufacturer can put whatever else they like on the chip.Context is everything. That is why I have been using the termLogically, given that there was "tightly attached memory", this should
"cpu cache" for the cache tied tightly to the cpu itself, which comes as
part of the core that ARM designs and delivers, along with parts such as
the NVIC.
be called "tightly attached cache" :)
Logicallt "cpu cache" is cache sitting on path between the CPU and a memory
device. It does not need to be tightly attached to the CPU, L2 and L3
caches in PC-s are not "tightly attached".
OK, but it is not a "cpu cache".And I have tried to use terms such as "buffer" or "flash"flash cache" looks resonable. Concerning difference between a buffer
controller cache" for the memory buffers often provided as part of flash
controllers and memory interfaces on microcontrollers, because those are
terms used by the microcontroller manufacturers.
and a cache there is indeed some fuzzines here. AFAICS word "buffer"
is used for logically very simple devices, once operation becomes a bit
more interesting it is usually called a cache. Anyway, given fuzzines
saying that something called a buffer is not a cache is risky, it
may have all features associalted normally with caches, and in such
a case deserves to be called a cache.
There can certainly be such trade-offs. I don't remember the details of the 386, but I /think/ the cache was connected separately on a dedicated bus, rather than on the bus that went to the memory controller (which was also off-chip, on the chipset). So it was logically close to the cpu even though it was physically on a different chip. I think if these sorts of details are of interest, a thread in comp.arch might make more sense than comp.lang.c.I would say that there is a tradeoff between cost and effect. AndBut>
if you look at devices where bus matrix runs at the same clock
as the core, then it makes sense to put cache on the other side.
No.
>
You put caches as close as possible to the prime user of the cache. If
the prime user is the cpu and you want to cache data from flash,
external memory, and other sources, you put the cache tight up against
the cpu - then you can have dedicated, wide, fast buses to the cpu.
there is question of technical possibility. For example, 386 was
sold as a chip, and all that a system designer could do was to put
a cache ont the motherboard. On chip cache would be better, but was
not possible.
IIUC in case of Cortex-M0 or say M4 manufactures getYes.
ARM core with busses intended to be connected to the bus matrix.
Manufacturs could add extra bus matrix or crossbar just to access cache,I believe the bus standard is from ARM, but the implementation is by the manufacturers (unlike the cpu core and immediately surrounding parts, including the cpu caches for devices that support that).
but bus width is specified by ARM design.
If main bus matrix and RAMCorrect, at least for static RAM (DRAM can have more latency than a cache even if it is at the same base frequency). cpu caches are useful when the onboard ram is slower than the cpu, and particularly when slower memory such as flash or external ram are used.
is clocked at CPU freqency the extra bus matrix and cache would
only add extra latency for no gain (of course, this changes when
main bus matrix runs at lower clock).
So putting cache onlyYes.
at flash interface makes sense: it helps there and on lower end
chips is not needed elswere.
Also, concerning caches in MCU-s noteThat is correct. There are three main solutions to this in any system with caches. One is to have cache snooping for the DMA controller so that the cpu and the DMA have the same picture of real memory. Another is to have some parts of the ram as being uncached (this is usually controlled by the MMU), so that memory that is accessed by the DMA is never in cache. And the third method is to use cache flush and invalidate instructions appropriately so that software makes sure it has up-to-date data. I've seen all three - and on some microcontrollers I have seen a mixture in use. Obviously they have their advantages and disadvantages in terms of hardware or software complexity.
that for writable memory there is problem of cache coherency. In
particular several small MCU-s have DMA channels. Non-coherent design
would violate user expectations and would be hard to use.
OTOH puttingFlash still has such issues during updates. I've seen badly made systems where things like the flash status register got cached. Needless to say, that did not work well! And if you have a bigger instruction cache, you have to take care to flush things appropriately during software updates.
coherent cache on memory side means extra complication to bus matrix (I
do not know what ARM did with their bigger cores). Flash being
mainly read-only does not have this problem.
The main differences are the dimensions of the caches, their physical and logical location, and the purpose for which they are optimised.But it can also make sense to put small buffers as part of memoryPoint is that in many cases they are organized like classic caches.
interface controllers. These are not organized like data or instruction
caches, but are specific for the type of memory and the characteristics
of it.
They cover only flash, but how it is different from caches in PC-s
that covered only part of possible RAM?
By the time you are talking about 1 KB and 64 lines, "cache" is a reasonable term. Many "flash accelerators" have perhaps just two lines.How this is done depends on details of the interface, details ofFrom STM32F400 reference manual:
the internal buses, and how the manufacturer wants to implement it. For
example, on one microcontroller I am using there are queues to let it
accept multiple flash read/write commands from the AHB bus and the IPS
bus, but read-ahead is controlled by the burst length of read requests
from the cross-switch (which in turn will come from cache line fill
requests from the cpu caches). On a different microcontroller, the
read-ahead logic is in the flash controller itself as that chip has a
simpler internal bus where all read requests will be for 32 bits (it has
no cpu caches). An external DRAM controller, on the other hand, will
have queues and buffers optimised for multiple smaller transactions and
be able to hold writes in queues that get lower priority than read requests.
>
These sorts of queues and buffers are not generally referred to as
"caches", because they are specialised queues and buffers. Sometimes
you might have something that is in effect perhaps a two-way
single-entry 16 byte wide read-only cache, but using the term "cache"
here is often confusing. At best it is a "flash controller cache", and
very distinct from a "cpu cache".
: Instruction cache memory
:
: To limit the time lost due to jumps, it is possible to retain 64 lines
: of 128 bits in an instruction cache memory.
That is 1kB instruction cache. In most of their marketing material they
say "flash accelerator", but in reference manual admited that this is a
cache (OK, they have also prefetch buffer and possibly "flash accelerator
= cache + buffer").
Simlarly, documentation of RP2040 says:No, it is a small device - it is dual core Cortex-M0+. But it does have a surprisingly large XIP flash cache at 16 KB. This is not a cpu cache, since it is connected directly to the QSPI flash controller rather than the cpu, but it /is/ a cache.
: An internal cache remembers the contents of recently-accessed flash
: locations, which accelerates the average bandwidth and latency of
: the interface.
Granted, RP2040 is rather big chip, but the same thing is used in smaller
ones.
Yes, I would call 64 lines a "cache" rather than a "buffer".I agree that behaviour can vary significantly.Well, with 64 lines and 2-set associativlity STM cache can give you
>
When you have a "flash controller cache" - or read-ahead buffers - you
typically have something like a 60-80% hit ratio for sequential code and
nearly 100% for very short loops (like you'd have for a memcpy() loop).
You have close to 0% hit ratio for branches or calls, regardless of
whether they are virtual or not (with virtual function dispatch
generally having one extra indirection at 0% hit rate). This is the
kind of "cache" you often see in microcontrollers with internal flash
and clock speeds of up to perhaps 150 Mz, where the flash might be at a
quarter of the main cpu clock.
quite decent hit ratio on branchy code, as long as working set is
not too large. I does not need to be a simple loop. Already 3
lines can be enough if you have single call to a simple function
and call is in the loop (and if you call via function pointer
compiler can not inline the function). More realistically, 8 lines
will cover several cases where code jumps between small number
of locations.
Les messages affichés proviennent d'usenet.