Sujet : Re: Arm ldaxr / stxr loop question
De : robfi680 (at) *nospam* gmail.com (Robert Finch)
Groupes : comp.archDate : 13. Nov 2024, 10:25:15
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vh1r9u$24r1e$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13
User-Agent : Mozilla Thunderbird
>
>
The execution time of each is the same, and the main cost is the fence
synchronizing the Load Store Queue with the cache, flushing the cache
comms queue and waiting for all outstanding cache ops to finish.
>
One other thing to be aware of is that the StoreLoad barrier needed
for sequential consistency is logically part of an LDAR, not part of a
STLR. This is an optimization, because the purpose of a StoreLoad in
that situation is to prevent you from seeing your own stores to a
location before everyone else sees them.
>
Andrew.
>
Humm... It makes me think of, well... does an atomic RMW have implied membars, or are they completely separated akin to the SPARC membar instruction? LOCK'ed RMW on Intel, XCHG instruction aside wrt its implied LOCK prefix, well, they are StoreLoad! Shit.
I am not sure how atomic memory ops are implemented through AMBA / AXI. I think AMBA / AXI is a very good bus to use. It turns out I have been using a similar proprietary bus (FTA) for my project. I have been working on an AXI bus bridge so that I can migrate my system to AXI.
In FTA bus there is a command field on the bus that allows atomic memory ops to be specified. There does not seem to be an equivalent in AXI. Unless perhaps the user tag field is used.
One thing about the AXI bus is I do not understand how the CAS instruction is supported. In my bus CAS is supported with double data on the bus. There are two data items that need to be supplied to the memory controller for CAS.
Another issue run into was FTA uses the response bus to send MSI interrupts. I am thinking of using the AXI read response bus for this purpose, by sending an ERR response for interrupts with the read data containing the interrupt info. But I do not know if AXI devices will get confused seeing a read response without any read address previously supplied. I am assuming devices will be able to filter bus transactions using transaction ids.
I have not been able to get the Q+ CPU to operate reliably in the FPGA, so I am stuck without a system CPU. I have given some thought to just using an FPGA with built in (ARM) CPU cores. I want to get working on peripheral cores.
I have been using the MIG controller in native mode (non-AXI) coupled with a multi-port memory controller for access to DDR3 RAM. The MIG controller can supply a lot of bandwidth, but it has some latency to it. I measured it at 28 clocks IIRC. I think timing depends on the memory component too. But that is at the MIG controller frequency. In my case 200 MHz. At the CPU frequency it is much less. While there is latency, a new request can be made almost every memory clock. To get a lot of bandwidth requestors like the frame buffer request an entire scan-line of data with back-to-back accesses to the MIG controller. The frame buffer uses a burst of 50 accesses, so it takes around 80 memory clock cycles. In terms of a 50 MHz CPU that would be only about 20 clocks. The frame buffer uses about 10% of the memory bandwidth and supports 800x600x16bpp mode. I think it could support 1920x1080x16bpp video. But I did not want to spend that much bandwidth on video.
The biggest issue I found with the MIG controller was specifying the right memory component.
It is 900 MHz memory but seems to run okay at 800 MHz. 800 MHz was more conveniently matched to other clocks in the system.