Sujet : Re: MSI interrupts
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 28. Mar 2025, 16:24:50
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <cb049d5490b541878e264cedf95168e1@www.novabbs.org>
References : 1 2 3 4 5
User-Agent : Rocksolid Light
On Fri, 28 Mar 2025 2:53:57 +0000, Dan Cross wrote:
In article <34434320650f5844b18b1c0b684acf43@www.novabbs.org>,
MitchAlsup1 <mitchalsup@aol.com> wrote:
On Thu, 27 Mar 2025 17:19:21 +0000, Dan Cross wrote:
In article <7a093bbb356e3bda3782c15ca27e98a7@www.novabbs.org>,
MitchAlsup1 <mitchalsup@aol.com> wrote:
-------------------
Or maybe even just swapping places between two elements in a
linked list. For example:
>
void
swap_places(Node **head, Node *a, Node *b)
{
Node *hp, *an, *ap, *bn, *bp;
>
assert(head != NULL);
assert(a != NULL);
assert(b != NULL);
>
if (a == b)
return;
>
esmLOCKprefetch(*head);
>
This should be a load not prefetch--you want the value of *head
>
I wondered about that. I ASSumed that `esmLOCKprefetch` was
enough to set up a cache line address monitor on `*head`, but if
you tell me I need `esmLOCKload`, then I need `esmLOCKload`:
>
hp = esmLOCKload(*head);
if (hp == a)
*head = b; // Note I need to set whatever head points to, not hp.
else if (hp == b)
*head = a;
>
(You can see a vestige of me trying that earlier by the `hp`
temporary that I left in when I posted that initially, but I
didn't like how it looked, so I changed it to what was posted.)
>
if (*head == a) // see !
*head = b;
else if (*head == b)
*head = a;
>
There is a ESM rule that states:: all participating cache lines
must be touched before any participating cache lines can be
modified.
>
Are these rules in a reference somewhere I can see? I kinda
reckoned `esmLOCKprefetch` would be enough to "touch" the line,
but I fully admit I was just guessing. Looks like `prefetch` is
mostly for the case where you'll store, but don't need to read?
>
Also note: Participating cache lines are checked for write permission
at touch time, and on cache miss, read with intent to modify.
>
Ok.
>
an = esmLOCKload(a->next);
ap = esmLOCKload(a->prev);
bn = esmLOCKload(b->next);
bp = esmLOCKload(b->prev);
>
b->next = an;
if (an != NULL) {
esmLOCKprefetch(an->prev);
an->prev = b;
}
b->prev = ap;
if (ap != NULL) {
esmLOCKprefetch(ap->next);
ap->next = b;
}
>
a->next = bn;
if (bn != NULL) {
esmLOCKprefetch(bn->prev);
bn->prev = a;
}
if (bp != NULL) {
esmLOCKprefetch(bp->next);
bp->next = a;
}
esmLOCKstore(a->prev, bp);
}
>
What I think you want:: (ignoring the 9 participants limit)
>
You mean 8 participants limit? That counting thing again. ;-}
>
What I really wanted was an example that exceeded that limit as
an expository vehicle for understanding what happens when the
limit is exceeded. What does the hardware do in that case?
Raise OPERATION (instruction) Fault.
void
swap_places(Node **head, Node *a, Node *b)
{
Node *hp, *an, *ap, *bn, *bp;
>
assert(head != NULL);
assert(a != NULL);
assert(b != NULL);
>
if (a == b)
return;
>
top_of_ATOMIC_event:
>
// this is the recovery point is you don't use esmINTERFERENCE()
// the very next instruction begins the event.
>
Yup. Tracking.
>
esmLOCKprefetch( an = esmLOCKload(a->next) );
esmLOCKprefetch( ap = esmLOCKload(a->prev) );
esmLOCKprefetch( bn = esmLOCKload(b->next) );
esmLOCKprefetch( bp = esmLOCKload(b->prev) );
>
What exactly does this do? This "touches" whatever lines the
things that `an`, `ap`, `bn`, and `bp` are all in, as well as
the lines that the `a` and `b` prev and next pointers point
into as well, right?
>
Node *Ehead = esmLOCKload(*head);
>
You shoulda just used the `hp` local I left for you for exactly
this. :-D
>
// by placing all the the touching before any manifestation, you put
// all the touch latency* in the code before it has tried to damage any
// participating memory location. (*) and TLB latency and 2nd party
// observation of your event.
>
// this would be the point where you would insert if( esmINTERFERENCE(
))
// if you wanted control at a known failure point rather than at the
// top of the event on failure.
>
if (Ehead == a)
*head = b;
else if (Ehead == b)
*head = a;
>
b->next = an;
if (an != NULL) {
an->prev = b;
}
b->prev = ap;
if (ap != NULL) {
ap->next = b;
}
>
a->next = bn;
if (bn != NULL) {
bn->prev = a;
}
if (bp != NULL) {
bp->next = a;
}
esmLOCKstore(a->prev, bp);
}
>
// now manifestation has lowest possible latency (as seen by this core
alone)
>
Doesn't this code now assume that `an->prev` is in the same
cache line as `an`, and similarly that `bn` is in the same line
as `bn->prev`? Absent an alignment constraint on the `Node`
type, that's not guaranteed given the definition posted earlier.
Everything you state is true, I just tried to move the esmLOCKxxxx
up to the setup phase to obey ESM rules.
That may be an odd and ill-advised thing for a programmer to do
if they want their list type to work with atomic events, but it
is possible.
>
The node pointers and their respective `next` pointers are okay,
so I wonder if perhaps this might have been written as:
>
void
swap_places(Node **head, Node *a, Node *b)
{
Node *hp, *an, *ap, *bn, *bp;
>
assert(head != NULL);
assert(a != NULL);
assert(b != NULL);
>
if (a == b)
return;
>
hp = esmLOCKload(*head);
esmLOCKprefetch(an = esmLOCKload(a->next));
ap = esmLOCKload(a->prev);
esmLOCKprefetch(bn = esmLOCKload(b->next));
bp = esmLOCKload(b->prev);
>
if (an != NULL) // I see what you did
esmLOCKprefetch(an->prev);
if (bn != NULL) {
esmLOCKprefetch(bn->prev);
bn->prev = a;
}
>
if (hp == a)
*head = b;
else if (hp == b)
*head = a;
>
b->next = an;
if (an != NULL)
an->prev = b;
b->prev = ap;
if (ap != NULL)
ap->next = b;
// illustrative code
a->next = bn; // ST Rbp,[Ra,#next]
if (bp != NULL) // PNE0 Rbp,T
bp->next = a; // ST Ra,[Rbp,#next]
>
esmLOCKstore(a->prev, bp);
}
>
But now the conditional testing whether or not `an` is `NULL` is
repeated. Is the additional branch overhead worth it here?
In My 66000 ISA, a compare against zero (or NULL) is just a branch
instruction, so the CMP zero is performed twice, but each use is
but a single Branch-on-condition instruction (or Predicate-on-
Condition instruction).
In the case of using predicates, FETCH/DECODE will simple issue
both then and else clauses into the execution window (else-clause
is empty here) and let the reservation stations handle execution
order. And the condition latency is purely the register dependence
chain. A 6-wide machine should have no trouble in inserting two
copies of the code commented by "illustrative code" above--in
this case 6-instructions or 2 sets of {ST, PNE0, ST}.
In the case of using a real branch, latency per use is likely to
be 2-cycles, moderated by typical branch prediction. The condition
will have resolved early, so we are simply FETCH/DECODE/TAKE bound.
{{That is: PRED should be faster in almost all conceivable cases.}}
- Dan C.
Generally when I write queue-code, I use a dummy Node front/rear
such that the checks for Null are unnecessary (at the cost of
following 1 more ->next or ->prev). That is Q->head and Q->tail
are never NULL and when the queue is empty there is a Node which
carries the fact the queue is empty (not using NULL as a pointer).
But that is just my style.