Sujet : Re: MM instruction and the pipeline
De : sfuld (at) *nospam* alumni.cmu.edu.invalid (Stephen Fuld)
Groupes : comp.archDate : 17. Oct 2024, 16:49:08
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <verblj$2prlm$1@dont-email.me>
References : 1 2
User-Agent : Mozilla Thunderbird
On 10/16/2024 12:26 PM, MitchAlsup1 wrote:
On Wed, 16 Oct 2024 5:56:34 +0000, Stephen Fuld wrote:
Even though this is about the MM instruction, and the MM instruction is
mentioned in other threads, they have lots of other stuff (thread
drift), and this isn't related to C, standard or otherwise, so I thought
it best to start a new thread,
>
My questions are about what happens to subsequent instructions that
immediately follow the MM in the stream when an MM instruction is
executing. Since an MM instruction may take quite a long time (in
computer time) to complete I think it is useful to know what else can
happen while the MM is executing.
>
I will phrase this as a series of questions.
>
1. I assume that subsequent non-memory reference instructions can
proceed simultaneously with the MM. Is that correct?
Yes, they may begin but they cannot retire.
2. Can a load or store where the memory address is in neither the source
nor the destination of the MM proceed simultaneously with the MM
Yes, in higher end implementations--after checking for no-conflict
{and this is dependent on accessing DRAM not MMI/O or config spaces}
3. Can a load where the memory address is within the source of the MM
proceed?
It is just read data, so, yes--at least theoretically.
For the next questions, assume for exposition that the MM has proceeded
to complete 1/3 of the move when the following instructions come up.
>
4. Can a load in the first third of the destination range proceed?
>
5. Can a store in the first third of the source range proceed?
>
6. Can a store in the first third of the destination range proceed?
In all 3 of these cases; one much have a good way to determine what has
already been MMed and what is waiting to be MMed. A low end
implementation
is unlikely to have such, a high end will have such.
On the other hand, MM is basically going to saturate the cache ports
(if for no other reason than being as fast as it can be) so, there
may not be a lot of AGEN capability or cache access port availability.
Yes, but. For a large transfer, say many hundreds to thousands of bytes, why run the "middle" bytes through the cache, especially the L1 (as you indicated in reply to Paul)? It would take some analysis of traces to know for sure, but I would expect the probability of reuse of such bytes to be low. If that is true, it would take far less resources (and avoid "sweeping" the cache) to do at least the intermediate reads and writes into just L3, or even a dedicated very small buffer or two. Furthermore, for the transfers after the first, unless there is a page crossing, why go through a full AGEN, when a simple add to the previous address is all that is required, thus freeing AGEN resources.
So, the faster one makes MM (and by extension MS) the less one needs
of overlap and pipelining.
Certainly true for small transfers, but for larger ones, I am not so sure. It may make more sense to delay the MM completion slightly for the time it takes for a single load to take place in order to allow the non-memory reference instructions following that load to execute overlapped with the completion of the MM. Needs trace analysis.
-- - Stephen Fuld(e-mail address disguised to prevent spam)