Sujet : Re: Arm ldaxr / stxr loop question
De : aph (at) *nospam* littlepinkcloud.invalid
Groupes : comp.archDate : 12. Nov 2024, 13:14:47
Autres entêtes
Message-ID : <3lGdnVvGQIAq2676nZ2dnZfqnPGdnZ2d@supernews.com>
References : 1 2 3 4 5 6 7 8 9 10
User-Agent : tin/1.9.2-20070201 ("Dalaruan") (UNIX) (Linux/4.18.0-553.5.1.el8_10.x86_64 (x86_64))
EricP <
ThatWouldBeTelling@thevillage.com> wrote:
Any idea what is the advantage for them having all these various
LDxxx and STxxx instructions that only seem to combine a LD or ST
with a fence instruction? Why have
LDAPR Load-Acquire RCpc Register
LDAR Load-Acquire Register
LDLAR LoadLOAcquire Register
plus all the variations for byte, half, word, and pair,
instead of just the standard LDx and a general data fence instruction?
All this, and much more can be discovered by reading the AMBA
specifications. However, the main point is that the content of the
target address does not have to be transferred to the local cache:
these are remote atomic operations. Quite nice for things like
fire-and-forget counters, for example.
The execution time of each is the same, and the main cost is the fence
synchronizing the Load Store Queue with the cache, flushing the cache
comms queue and waiting for all outstanding cache ops to finish.
One other thing to be aware of is that the StoreLoad barrier needed
for sequential consistency is logically part of an LDAR, not part of a
STLR. This is an optimization, because the purpose of a StoreLoad in
that situation is to prevent you from seeing your own stores to a
location before everyone else sees them.
Andrew.