Sujet : Re: MSI interrupts
De : monnier (at) *nospam* iro.umontreal.ca (Stefan Monnier)
Groupes : comp.archDate : 26. Mar 2025, 14:55:36
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <jwv4izfncov.fsf-monnier+comp.arch@gnu.org>
References : 1 2 3 4 5
User-Agent : Gnus/5.13 (Gnus v5.13)
Leaving aside the unchecked boundary conditions I highlighted
above, there are other questions that this raises. You
mentioned that the ESM stuff can use up to 8 cache lines in a
single atomic "event"; let's count how many this code may
encounter:
>
1. `from` may point to a cache line
2. `to` may point to a different cache line.
3. Does your architecture use a stack? What sort of alignment
criteria do you impose? At first blush, it seems to me that
the pointers `from_next`, `from_prev`, and `to_next` could be
on the stack and if so, will be on at least one cache line,
and possibly 2, if the call frame for `MoveElement` spans
more than one. Of course, it's impossible to say when this
is compiled, but maybe you require that the stack is always
aligned to a cache line boundary or something. Or maybe
these are all in registers and it's fine.
AFAIK the stack accesses will use normal loads that won't be tracked for
conflicts, so they don't count towards the 8 cache-lines limit.
4. Depending on the size and alignment criteria of the the
`Element` structure and the placement of the `next` and
`prev` elements in the struct (unknowable from this example),
`to->next` may be on another cache line.
The coder may need to be careful with such slot placement and
alignment, indeed.
So potentially this code _could_ touch 9 or 10 cache lines, more
than you said are supported for a single atomic event. What
happens in that case? Does the code just always return false?
I believe the code will indeed always return false in that case.
If I understand this correctly, the `esmLOCKload`s are simply
loads, but they make sure that the compiler emits instructions
that tie into your atomic support framework. Earlier you wrote,
"esmINTERFERENCE is a query on the state of the event and whether
a 2nd party has performed a memory action that kills the event."
Ok, I take this to mean _at the time of the store_, which
appears later in the program.
No, I believe it is at the time when `esmINTERFERENCE` is called.
If it were simply a query at the time it was written in the code, then
it seems like it opens a TOCTOU bug (what if something interfers after
the check, but before the the locked store at the end?).
IIUC the way this works is that in case there's a conflict later, the
other core will get a NACK (i.e. we'll win and they'll lose, even if
they have a higher priority) so we're guaranteed that we'll finish our
transaction successfully. Of course, that can be guaranteed only if we
already have "locked" the cache lines we'll need, which is what the
`esmLOCKprefetch` calls are about.
IOW the `esmINTERFERENCE` call is the time where the CPU decides whether
to "commit", and then the code between `esmINTERFERENCE` and
`esmLOCKstore` is the code which actually stores the changes we
decided to commit.
IIUC, the `esmINTERFERENCE` call is not needed, in which case the
operation should retry as many times as needed to succeed.
Assuming nothing has poisoned the event, the various pointer
stores inside the conditional are conditionally performed; the
`esmLOCKstore` to `from->next` commits the event, at which
point all side effects are exposed and durable.
The `esmLOCKstore` marks the end of the transaction. IIUC it can also
mark the "commit" but only if `esmINTERFERENCE` isn't used.
So now I, as a systems software person, have to ask design
question around how I want to use these architectural features,
if at all, weighing various pros and cons. This is where I'm
struggling with your proposal: given that I'm going to have to
implement more general support for the things that the
architecture gives me anyway, what do I gain by also supporting
the specialized mechanisms?
I think the expectation is that you'd *use* those specialized mechanisms
but you wouldn't *support* them.
Right now, I'm not seeing it that it is. "It solves priority
inversion for the size of problems that it's applicable to" is
not enough to justify the additional cost of adoption. I have
to solve that problem anyway, whether I use this or not.
The ESM stuff is what you'll use at the assembly-level instead of T&S,
C&S, LL/SC because that's what the CPU supports/provides. But if you
prefer to code against T&S, C&S, LL/SC, AFAIK it'll work as well,
because you can "simulate" them easily with ESM.
Stefan