Sujet : Re: MSI interrupts
De : robfi680 (at) *nospam* gmail.com (Robert Finch)
Groupes : comp.archDate : 17. Mar 2025, 22:14:37
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vra3bv$teuf$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
User-Agent : Mozilla Thunderbird
On 2025-03-17 2:33 p.m., EricP wrote:
Michael S wrote:
On Mon, 17 Mar 2025 13:38:12 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:
>
Robert Finch <robfi680@gmail.com> writes:
>
<please trim posts>
>
Consider that you have a pool of 4 cores setup to receive
interrupts. That those 4 cores are running at differing priorities
and the interrupt is at a still different priority. You want the
core operating at the lowest
priority (with the right software stack) to accept the interrupt
!!
Okay, I wrote a naive hardware filter for this which should work
okay for small numbers of CPUs but does not scale up very well.
What happens if there is no core ready? Place the IRQ back into the queue (regenerate the IRQ as an IRQ miss IRQ)? Or just wait for an available core. I do not like the idea of waiting as it could stall
the system.
Use a fifo to hold up to N pending IRQ. Define a signal that asserts
when the fifo is non-empty. The CPU can mask the signal to prevent
interruption, when the signal is unmasked the CPU pops the
first IRQ from the FIFO. Or use a bitmap in flops or SRAM
(prioritization happensin the logic that asserts the fact hat an interrupt is pending to the CPU).
>
Choose whether you want a FIFO/bitmap per target CPU, or global
pending data with the logic to select highest priority pending
IRQ when the CPU signals that interrups are not masked.
>
The problem Robert is talking about arises when there are many
interrupt source and many target CPUs.
The required routing/prioritization/acknowledgment logic (at least a
naive logic I am having in mind) would be either non-scalable or
relatively complicated. Process of selection for the second case will
take multiple cycles (I am thinking about ring).
Another problem is what does the core do with the in flight instructions.
Method 1 is simplest, it injects the interrupt request at Retire
as that's where the state of everything is synchronized.
The consequence is that, like exceptions, the in flight instructions all
get purged, and we save the committed RIP, RSP and interrupt control word.
While that might be acceptable for a 5 stage in-order pipeline,
it could be pretty expensive for an OoO 200+ instruction queue
potentially tossing hundreds of cycles of near finished work.
Method 2 pipelines the switch by injecting the interrupt request at Fetch.
Decode converts the request to a special uOp that travels down the IQ
to Retire and allows all the older work to complete.
This is more complex as it requires a two phase hand-off from the
Interrupt Control Unit (ICU) to the core as a branch mispredict in the
in flight instructions might cause a tentative interrupt acceptance to later
be withdrawn.
Oh my, I forgot.
Method 2 will be used as it is desired to able to insert either an instruction at the fetch stage or a vector address.
The ICU believes the core is in a state to accept a higher priority
interrupt. It sends a request to core, which checks its current state and
sends back an immediate INT_ACK if _might_ accept and stalls Fetch, or a NAK.
When the special uOp reaches Retire, it sends a signal to Fetch which
then sends an INT_ACCEPT signal to ICU to complete the handoff.
If a branch mispredict occurs that causes interrupts to be disabled,
then Fetch sends an INT_REJECT to ICU, and unstalls its fetching.
(Yes that is not optimal - make it work first, make it work well second.)
This also raises a question about what the ICU is doing during this
long latency handoff. One wouldn't want ICU to sit idle so it might
have to manage the handoff of multiple interrupts to multiple cores
at the same time, each as its own little state machine.
One should see that this decision on how the core handles the
handoff has a large impact on the design complexity of the ICU.
Personally, I would use method 1 first to get something working
then if this is OoO think about getting fancy with method 2.
Would it not be rare for this to occur? If so, I think the pipeline could just be flushed from the int disabling instruction on. Then the interrupted address recorded as the int disabling instruction. <- this requires identification of an int disabling instruction though. It might just be a store to a specific I/O address, therefore a special store opcode to make it easy to detect?