Liste des Groupes | Revenir à c arch |
mitchalsup@aol.com (MitchAlsup1) writes:PCIe has an MSI-X interrupt 'capabillity' which consists of
a number (n) interrupt desctiptors and an associated Pending
Bit Array where each bit in PBA has a corresponding 128-bit
desctiptor. A descriptor contains a 64-bit address, a 32-bit
message, and a 32-bit vector control word. >
There are 2-levels of enablement, one at the MSI-X configura-
tion control register and one in each interrupt descriptor at
vector control bit[31].
>
As the device raises an interrupt, it sets a bit in PBA.
>
When MSI-X is enabled and a bit in PBA is set (1) and the
vector control bit[31] is enabled, the device sends a
write of the message to the address in the descriptor,
and clears the bit in PBA.
Note that if the interrupt condition is asserted after theSo, the interrupt only becomes pending in BPA if it cannot be sent immediately. Thanks for the clarification.
global enable in the MSI-X capability and the vector enable
have both been set to allow delivery, the message will be sent to
the root complex and PBA will not be updated. (P is for
pending, and once the message is sent, it's no longer
pending). PBA is only updated when the interrupt is masked
(either function-wide in the capability or per-vector).
>
I am assuming that the MSI-X enable bit is used to throttle
In my experience the MSI-X function enable and vector enablesSo, these degenerated into more masking levels that are not
are not modified during runtime, rather the device has control
registers which allow masking of the interrupt (e.g.
for AHCI, the MSI message will only be sent if the port
PxIE (Port n Interrupt Enable) bit corresponding to a
PxIS (Port n Interrupt Status) bit is set).
Granted, AHCI specifies MSI, not MSI-X, but every MSI-X
device I've worked with operates the same way, with
device specific interrupt enables for a particular vector.
a device so that it sends bursts of interrupts to optimize
the caching behavior of the cores handling the interrupts.
run applications->handle k interrupts->run applications.
A home machine would not use this featrue as the interrupt
load is small, but a GB server might more control over when.
But does anybody know ??
Yes, we use MSI-X extensively. See above.
There are a number of mechanisms used for interrupt moderation,
but all generally are independent of the PCI message delivery.
(e.g. RSS spreads interrupts across multiple target cores,
or the Intel 10Ge network adapters interrupt moderation feature).
>
a) device dommand to interrupt descriptor mapping {
Thre is no mention of the mapping of commands to the device
and to these interrupt descriptors. Can anyone supply input
or pointers to this mapping.
Once the message leaves the device, is received by the
root complex port and is forwarded across the host bridge
to the system fabric, it's completely under control of
the host. On x86, the TLP for the upstream message is
received and forwarded to the specified address (which is
the IOAPIC on Intel and the GIC ITS on Arm64).
The interrupt controller may further mask the interrupt if{note to self:: that is why its a local APIC--it has to be close
desired or if the interrupt priority is lower than the
current running priority.
>
A single device (such as a SATA drive) might have a queue of
outstanding commands that it services in whatever order it
thinks best. Many of these commands want to inform some core
when the command is complete (or cannot be completed). To do
this, device sends a stored interrupt messages to the stored service port.
Each SATA port has an PxIS and PxIE register. The SATA (AHCI)I see (below) that you (they) migrated all the stuff I though might
controller
MSI configuration can provide one vector per port - the main
difference between MSI and MSI-X is that the interrupt numbers
for MSI must be consecutive and there is only one address;
while for MSI-X each vector has an unique address and a programmable
data (interrupt number) field. The interpretation of the data
of the MSI-X or MSI upstream write is up to the interrupt controller
and may be virtualized in the interrupt controller.
Note that support for MSI in AHCI is optional (in which case the
legacy level sensitive PCI INTA/B/C/D signals are used).
The AHCI standard specification (ahci_1_3.pdf) is available publically.
}
I don't really NEED to know this mapping, but knowing would
significantly enhance my understanding of what is supposed to be going on, and thus avoid making crippling errors.
>
b) address space of interrupt service port {
The address in the interrupt descriptor points at a service port (APIC). Since a service port is "not like memory"*, I
want to mandate this aqddress be in MMI/O space, and since My 66000 has a full 64-bit address space for MMI/O there is no burden on the size of MMI/O space--it is already as big
as possible on a 64-bit machine. Plus, MMI/O space has the property of being sequentially consistent whereas DRAM is
only cache consistent.
From the standpoint of the PCIexpress root port, the upstream writeSo the message arrive at the top of the PCIe tree is RAW, then
generated by the device to send the MSI message to the host
looks just like any other inbound DMA from the device to the
host. It is the responsibility of the host bridge and interconnect to
route the message the appropriate destination (which generally
is an interrupt controller, but just as legally could be a
DRAM address which software polls periodically).
>
Most current architectures just partition a hunk of the physical address space as MMI/O address space.
The address field in the MSI-X vector (or MSI-X capability)
is opaque to hardware below the PCIe root port.
Our chips recognize the interrupt controller range ofWhat I am trying to do is to figure out a means to route the
addresses in the inbound message at the host bridge
and route the message to the interrupt translation service;
the destinations in the interrupt controller are simply
control and status registers in the MMIO space. The
ARM64 interrupt controller supports multiple destinations
with different semantics (SPI and xSPI have one target
register and LPI has a different target register the address
of which is programmed into the MSI-X Vector address field).
>
(*) memory has the property that a read will return the last
bit pattern written, a service port does not.
>
I assume that service port addresses map to different cores (or local APICs of a core).
The IOAPIC handles the message and has configuration registers
that determine which lAPIC should be signalled.
The GIC has configuration tables in memory that can remapGIC = Global Interrupt Controller ?
the interrupt to a different vector (e.g. for a guest VM).
I want to directly support the
notion of a virtual core so while a 'chip' might have a large
number of physical cores, one would want a pool of thousands+ of virtual cores. I want said service ports to support raising interrupt directly to a physical or virtual core.
Take a look at IHI0069
(https://developer.arm.com/documentation/ihi0069/latest/)
}
>
Apparently, the message part of the MSI-X interrupt can be interpreted any way that both SW and HW agree.
Yes.
This works
for already defined architectures, and doing it like one
or more others, makes an OS port significantly easier.
However what these messages contain is difficult to find
via Google.
The message is a 32-bit field and it is fully interpreted by
the interrupt controller (The GIC can be configured to support
from 16 to 32-bits data payload in an upstream MSI-X write;
the interpretation of the data is host specific).
On intel and ARM systems, the firmware knows the grungy details
and simply passes the desired payload value to the kernel
via the device tree(linux) or ACPI tables (for windows/linux).>
So, it seems to me, that the combination of the 64-bit address
and the 32-bit message must provide::
a) which level of the system to interrupt
{Secure Monitor, HyperVisor, SuperVisor, Application}
No. That's completely a function of the interrupt controller
and how the hardware handles the data payload.
b) which core should handle the interrupt
{physical[0..k], virtual[l..m]}
Again, a function of the interrupt controller.
c) what priority level is the interrupt.
{There are 64 unique priority levels}
Yep, a function of the interrupt controller.
d) something about why the interrupt was raised
The interrupt itself causes the operating system
device driver interrupt function to be invoked. The
device-specific interrupt handler determines both
why the interrupt was raised (e.g. via the PxIS
register in the AHCI/SATA controller) and takes
the appropriate action.
On ARM64, it is common for the data field forI was expecting that.
the MSI-X interrupts to number starting at zero
on every device, and they're mapped to a system-wide
unique value by the interrupt controller (e.g.
the GICv4 ITS).
If interrupt remapping hardware is
not available then unique data payloads for each
device need to be used.
Note that like any other inbound DMA, the addressObviously.
in the MSI-X TLP that gets sent to the host bridge is subject
to translation by an IOMMU before getting to the
interrupt controller (or by the device itself if it
supports PCI-e Address Translation Services (ATS)).
{what remains of the meassage}
>
I suspect that (a) and (b) are parts of the address while (c)
and (d) are part of the message. Although nothing prevents
(c) from being part of the address.
>
Once MSI-X is sorted out MSI becomes a subset.
>
HostBridge has a service port that provides INT[A,B,C,D] to
MSI-X translation, so only MSI-X message are used system-
wide.
Note that INTA/B/C/D are level-sensitive. This requiresGotcha.
TWO MSI-X vectors - one that targets an "interrupt set"
register and the other targets and "interrupt clear"
register.
>
------------------------------------------------------------
>
It seems to me that the interrupt address needs translation
via I/O MMU, but which of the 4 levels provides the trans-
lation Root pointers ??
On Intel the IOMMU translation tables are not shared with theI have seen in the past 3 days AP being used to point at a
AP.
The PCI address (aka Stream ID) is passed to the interrupt
controller and IOMMU and used as an index to determine the
page table root pointer.
The stream id format is
<2:0> PCI function numberI use ChipID for the last field in case each chip has its own
<7:3> PCI device number
<15:8> PCI bus number
<xx:16> PCI segment (root complex) number.
This allows each device capable of inbound DMA to identify
themselves uniquely to the interrupt controller and IOMMU.
Both intel and AMD use this convention.
>
Am I allowed to use bits in Vector Control to provide this ??
But if I put it there then there is cross privilege leakage !
No, you're not allowed to do anything not explicitly allowedWhy did PCI committee specify a 32-bit container and define the
in the PCI express specification. Remember, an MSI-X write
generated by the device is indistinguishable from any other
upstream DMA request initiated by the device.
>
c) interupt latency {
When "what is running on a core" is timesliced by a HyperVisor,
a core that launched a command to a device may not be running
at the instant the interrupt arrives back.
See again the document referenced above. The interrupt controllerAre you using the word 'signal' as LINUX signal delivery, or as
is aware that the guest is not currently scheduled and maintains
a virutal pending state (and can optionally signal the hypervisor
that the guest should be scheduled ASAP).
Most of this is done completely by the hardware, without anyThat is the goal.
intervention by the hypervisor for the vast majority of
interrupts.
>
It seems to me, that the HyperVisor would want to perform ISR
processing of the interrupt (low latency) and then schedule
the softIRQs to the <sleeping> core so when it regains control
the pending I/O stack of "stuff" is proprly cleaned up.
>
So, shold all initerrupt simple go to HyperVisor and let HV
sort it all out? Or can the <sleeping> virtual core just deal
with it when it is given a next time slice ??
The original GIC did something like this (the HV took allLeave HV out of the loop unless something drastic happens.
interrupts and there was a hardware mechanism to inject them
into a guest as if they were a hardware interrupt). But
it was too much overhead going through the hypervisor, especially
when the endpoint device support the SRIOV capability. So the
GIC supports handling virtual interrupt delivery completely
in hardware unless the guest is not currently resident on any
virtual CPU.
Les messages affichés proviennent d'usenet.