On 10/14/24 7:55 PM, Lawrence D'Oliveiro wrote:
[snip]
On the other hand, some stubborn holdouts are still fond of microkernels
-- you just have to say the whole idea is pointless, and they come out of
the woodwork in a futile attempt to disagree ...
While the argument that only microkernels can provide modularity
with respect to software development seems highly flawed,
modularity with respect to privilege seems more challenging
(impossible?) for a monolithic kernel and modularity with respect
to fault isolation seems to require substantially more discipline/
constraint than typical for a monolithic design.
Data isolation seems possible in a monolithic kernel such that a
failure could be isolated to a specific subsystem and that
subsystem could be restarted into a known good state.
Microrebooting seems uncommon. I am guessing this comes from extremely high availability not being that important and/or other
mechanism are used for availability, especially at warehouse
scale.
Physical distribution of functionality may also be more foreign to
a monolithic kernel design. E.g., pinning functionality to a
particular core or kind of core may urge message passing. In
theory, something like MWAIT could be used for a fast and targeted
inter-processor interrupt, but the limit of one wait condition per
active thread is a significant constraint.
The primary argument against microkernels seem to be the poor
performance due to changing permission and more abstracted
communication. Most of the overhead for permission change is not
physically fundamental; the overhead can be nearly equal to that
of a function call. Since the overhead of indirect function calls
seems to be considered acceptable in a monolithic kernel, the
performance overhead argument seems limited to existing hardware
rather than implementable hardware.
(This also depends on permission metadata being present in a nearby cache. If the code/data and permission caches have similar
persistence, this would mean the fast case would be nearly equal.
With hierarchical page tables — especially if nested — the slow
case for permission change can be much worse for a permission
change.)
Software like FUSE (Filesystem in Userspace) hints that some
microkernel aspects are desirable even in a monolithic kernel
system.
PA-RISC and Itanium had page groups, which could allow fast
permission removal (invalidating or removing some permissions from
a page group key could be fast). Fast de-privileging might be
useful. Scanning a binary for (not) re-enabling might be practical
if the operation is not simply a store, and this would allow
re-enabling permissions to be fast. However, actually removing
the permission to grant permissions seems better.
Itanium's Enter Privileged Code (EPC) instruction was intended to
provide fast system calls, but it had some complications in
interacting with other Itanium features (I vaguely recall).
I know relatively little about OSes, but the arguments I have read
on both sides seem to have been very biased.