Liste des Groupes | Revenir à c arch |
mitchalsup@aol.com (MitchAlsup1) writes:[...]On Mon, 29 Jul 2024 13:21:10 +0000, Anton Ertl wrote:So if we don't want to stall for memory accesses all the time, we need
>mitchalsup@aol.com (MitchAlsup1) writes:>On Fri, 26 Jul 2024 17:00:07 +0000, Anton Ertl wrote:>Similarly, I expect that hardware that is designed for good TSO or>
sequential consistency performance will run faster on code written for
this model than code written for weakly consistent hardware will run
on that hardware.
According to Lamport; only the ATOMIC stuff needs sequential
consistency.
So, it is completely possible to have a causally consistent processor
that switches to sequential consistency when doing ATOMIC stuff and gain
performance when not doing ATOMIC stuff, and gain programmability when
doing atomic stuff.
That's not what I have in mind. What I have in mind is hardware that,
e.g., speculatively performs loads, predicting that no other core will
store there with an earlier time stamp. But if another core actually
performs such a store, the usual misprediction handling happens and
the code starting from that mispredicted load is reexecuted. So as
long as two cores do not access the same memory, they can run at full
speed, and there is only slowdown if there is actual (not potential)
communication between the cores.
OK...>>
A problem with that approach is that this requires enough reorder
buffering (or something equivalent, there may be something cheaper for
this particular problem) to cover at least the shared-cache latency
(usually L3, more with multiple sockets).
The depth of the execution window may be smaller than the time it takes
to send the required information around and have this core recognize
that it is out-of-order wrt memory.
a bigger execution window, either by making the reorder buffer larger,
or by using a different, cheaper mechanism.
Concerning the cheaper mechanism, what I am thinking of is hardware
checkpointing every, say, 200 cycles or so (subject to fine-tuning).
The idea here is that communication between cores is very rare, so
Les messages affichés proviennent d'usenet.