Sujet : Re: MSI interrupts
De : cross (at) *nospam* spitfire.i.gajendra.net (Dan Cross)
Groupes : comp.archDate : 29. Mar 2025, 15:28:02
Autres entêtes
Organisation : PANIX Public Access Internet and UNIX, NYC
Message-ID : <vs901i$f7e$1@reader1.panix.com>
References : 1 2 3 4
User-Agent : trn 4.0-test77 (Sep 1, 2010)
In article <cb049d5490b541878e264cedf95168e1@
www.novabbs.org>,
MitchAlsup1 <
mitchalsup@aol.com> wrote:
On Fri, 28 Mar 2025 2:53:57 +0000, Dan Cross wrote:
[snip]
What I really wanted was an example that exceeded that limit as
an expository vehicle for understanding what happens when the
limit is exceeded. What does the hardware do in that case?
>
Raise OPERATION (instruction) Fault.
Ok, very good. The software implications could be profound.
[snip]
// by placing all the the touching before any manifestation, you put
// all the touch latency* in the code before it has tried to damage any
// participating memory location. (*) and TLB latency and 2nd party
// observation of your event.
>
// this would be the point where you would insert if( esmINTERFERENCE(
))
// if you wanted control at a known failure point rather than at the
// top of the event on failure.
>
if (Ehead == a)
*head = b;
else if (Ehead == b)
*head = a;
>
b->next = an;
if (an != NULL) {
an->prev = b;
}
b->prev = ap;
if (ap != NULL) {
ap->next = b;
}
>
a->next = bn;
if (bn != NULL) {
bn->prev = a;
}
if (bp != NULL) {
bp->next = a;
}
esmLOCKstore(a->prev, bp);
}
>
// now manifestation has lowest possible latency (as seen by this core
alone)
>
Doesn't this code now assume that `an->prev` is in the same
cache line as `an`, and similarly that `bn` is in the same line
as `bn->prev`? Absent an alignment constraint on the `Node`
type, that's not guaranteed given the definition posted earlier.
>
Everything you state is true, I just tried to move the esmLOCKxxxx
up to the setup phase to obey ESM rules.
Yeah, that makes sense. Describing it as a rule, however,
raises the question of whether one must touch a line before a
store to a different line, or just before a store to that line?
That may be an odd and ill-advised thing for a programmer to do
if they want their list type to work with atomic events, but it
is possible.
>
The node pointers and their respective `next` pointers are okay,
so I wonder if perhaps this might have been written as:
>
void
swap_places(Node **head, Node *a, Node *b)
{
Node *hp, *an, *ap, *bn, *bp;
>
assert(head != NULL);
assert(a != NULL);
assert(b != NULL);
>
if (a == b)
return;
>
hp = esmLOCKload(*head);
esmLOCKprefetch(an = esmLOCKload(a->next));
ap = esmLOCKload(a->prev);
esmLOCKprefetch(bn = esmLOCKload(b->next));
bp = esmLOCKload(b->prev);
>
if (an != NULL) // I see what you did
esmLOCKprefetch(an->prev);
if (bn != NULL) {
esmLOCKprefetch(bn->prev);
bn->prev = a;
}
>
if (hp == a)
*head = b;
else if (hp == b)
*head = a;
>
b->next = an;
if (an != NULL)
an->prev = b;
b->prev = ap;
if (ap != NULL)
ap->next = b;
// illustrative code
a->next = bn; // ST Rbp,[Ra,#next]
if (bp != NULL) // PNE0 Rbp,T
bp->next = a; // ST Ra,[Rbp,#next]
>
esmLOCKstore(a->prev, bp);
}
>
But now the conditional testing whether or not `an` is `NULL` is
repeated. Is the additional branch overhead worth it here?
>
In My 66000 ISA, a compare against zero (or NULL) is just a branch
instruction, so the CMP zero is performed twice, but each use is
but a single Branch-on-condition instruction (or Predicate-on-
Condition instruction).
>
In the case of using predicates, FETCH/DECODE will simple issue
both then and else clauses into the execution window (else-clause
is empty here) and let the reservation stations handle execution
order. And the condition latency is purely the register dependence
chain. A 6-wide machine should have no trouble in inserting two
copies of the code commented by "illustrative code" above--in
this case 6-instructions or 2 sets of {ST, PNE0, ST}.
>
In the case of using a real branch, latency per use is likely to
be 2-cycles, moderated by typical branch prediction. The condition
will have resolved early, so we are simply FETCH/DECODE/TAKE bound.
>
{{That is: PRED should be faster in almost all conceivable cases.}}
Okay, that all makes sense. Thanks for the detailed
explanation. I agree it's very slick; is this documented
somewhere publicly? If not, why not?
Generally when I write queue-code, I use a dummy Node front/rear
such that the checks for Null are unnecessary (at the cost of
following 1 more ->next or ->prev). That is Q->head and Q->tail
are never NULL and when the queue is empty there is a Node which
carries the fact the queue is empty (not using NULL as a pointer).
>
But that is just my style.
Ok.
- Dan C.