Sujet : Re: MSI interrupts
De : cross (at) *nospam* spitfire.i.gajendra.net (Dan Cross)
Groupes : comp.archDate : 29. Mar 2025, 21:58:44
Autres entêtes
Organisation : PANIX Public Access Internet and UNIX, NYC
Message-ID : <vs9mu4$79f$2@reader1.panix.com>
References : 1 2 3 4
User-Agent : trn 4.0-test77 (Sep 1, 2010)
In article <1b9a9644c3f5cbd2985b89443041e01a@
www.novabbs.org>,
MitchAlsup1 <
mitchalsup@aol.com> wrote:
On Sat, 29 Mar 2025 14:28:02 +0000, Dan Cross wrote:
In article <cb049d5490b541878e264cedf95168e1@www.novabbs.org>,
MitchAlsup1 <mitchalsup@aol.com> wrote:
On Fri, 28 Mar 2025 2:53:57 +0000, Dan Cross wrote:
[snip]
What I really wanted was an example that exceeded that limit as
an expository vehicle for understanding what happens when the
limit is exceeded. What does the hardware do in that case?
>
Raise OPERATION (instruction) Fault.
>
Ok, very good. The software implications could be profound.
>
[snip]---------
>
Doesn't this code now assume that `an->prev` is in the same
cache line as `an`, and similarly that `bn` is in the same line
as `bn->prev`? Absent an alignment constraint on the `Node`
type, that's not guaranteed given the definition posted earlier.
>
Everything you state is true, I just tried to move the esmLOCKxxxx
up to the setup phase to obey ESM rules.
>
Yeah, that makes sense. Describing it as a rule, however,
raises the question of whether one must touch a line before a
store to a different line, or just before a store to that line?
>
Right from the spec::
>
"There is an order of instructions imposed upon software <i.e.,
compiler> where:
>
• all participating inbound memory reference instructions shall be
performed prior to manifestation;
• non-participating inbound memory reference instructions should be
performed prior to manifestation;
• the <optional> query instruction denotes the boundary between setup
and manifestation;
• the first Outbound to a participating cache line <also> begins
manifestation
• the only Outbound with Lock bit set (L ≡ 1) completes the event
>
The processor monitors the instruction sequence of the ATOMIC event and
will raise the OPERATION exception when the imposed order is violated."
>
Basically, all participating page-faults happen prior to any
modification
attempts.
Ok.
That may be an odd and ill-advised thing for a programmer to do
if they want their list type to work with atomic events, but it
is possible.
>
The node pointers and their respective `next` pointers are okay,
so I wonder if perhaps this might have been written as:
>
void
swap_places(Node **head, Node *a, Node *b)
{
Node *hp, *an, *ap, *bn, *bp;
>
assert(head != NULL);
assert(a != NULL);
assert(b != NULL);
>
if (a == b)
return;
>
hp = esmLOCKload(*head);
esmLOCKprefetch(an = esmLOCKload(a->next));
ap = esmLOCKload(a->prev);
esmLOCKprefetch(bn = esmLOCKload(b->next));
bp = esmLOCKload(b->prev);
>
if (an != NULL) // I see what you did
esmLOCKprefetch(an->prev);
if (bn != NULL) {
esmLOCKprefetch(bn->prev);
bn->prev = a;
}
>
if (hp == a)
*head = b;
else if (hp == b)
*head = a;
>
b->next = an;
if (an != NULL)
an->prev = b;
b->prev = ap;
if (ap != NULL)
ap->next = b;
// illustrative code
a->next = bn; // ST Rbp,[Ra,#next]
if (bp != NULL) // PNE0 Rbp,T
bp->next = a; // ST Ra,[Rbp,#next]
>
esmLOCKstore(a->prev, bp);
}
>
But now the conditional testing whether or not `an` is `NULL` is
repeated. Is the additional branch overhead worth it here?
>
In My 66000 ISA, a compare against zero (or NULL) is just a branch
instruction, so the CMP zero is performed twice, but each use is
but a single Branch-on-condition instruction (or Predicate-on-
Condition instruction).
>
In the case of using predicates, FETCH/DECODE will simple issue
both then and else clauses into the execution window (else-clause
is empty here) and let the reservation stations handle execution
order. And the condition latency is purely the register dependence
chain. A 6-wide machine should have no trouble in inserting two
copies of the code commented by "illustrative code" above--in
this case 6-instructions or 2 sets of {ST, PNE0, ST}.
>
In the case of using a real branch, latency per use is likely to
be 2-cycles, moderated by typical branch prediction. The condition
will have resolved early, so we are simply FETCH/DECODE/TAKE bound.
>
{{That is: PRED should be faster in almost all conceivable cases.}}
>
Okay, that all makes sense. Thanks for the detailed
explanation. I agree it's very slick; is this documented
somewhere publicly? If not, why not?
>
Documented: Yes 21 pages.
Public: I keep holding back as if I were to attempt to patent certain
aspects. But I don't seem to have the energy to do such. I as if you
wanted to see it.
(I assume you meant "ask if you want to see it"?)
Sure, I think it would be interesting. Should I send you an
email?
- Dan C.