Stephen Fuld <
sfuld@alumni.cmu.edu.invalid> writes:
https://arstechnica.com/security/2024/03/hackers-can-extract-secret-encryption-keys-from-apples-mac-chips/
That's a pretty bad article, but at least one can read it without
JavaScript, unlike the web page of the vulnerability
<
https://gofetch.fail/>.
Classically hardware prefetchers have based their predictions on the
addresses of earlier accesses. If they base their predictions only on
architectural accesses and (by coding the cryptographic code
appropriately) don't have information about the secrets in the
addresses, the prefetcher cannot reveal these secrets through a side
channel. Cryptographic code has been written that way for quite a
while.
These prefetchers are not so great on pointer-chasing code (unless the
data happens to be allocated at regular distances), so apparently
Apple engineers, and according to this article also Intel engineers
added a prefetcher that prefetches based on the contents of data that
it fetches. To anyone who knows how a cache side-channel works, it is
crystal-clear that this makes it possible to reveal the *contents* of
accessed memory through a cache side channel. Even if only
architectural accesses are used for that prediction, the possibility
is still there, because cryptographic code has to access the secrets.
This should be clear to anyone who understands Spectre and actually
anyone who understands classical (architectural) side-channel attacks;
but it should have been on the minds of the hardware designers very
much since Spectre has been discovered.
The contribution of the GoFetch researchers is that they demonstrate
that this is not just a theoretical possibility.
If Intel added this vulnerability in Raptor Lake (as the article
states), they have to take the full blame. On the positive side, the
GoFetch researchers have not found a way to exploit Raptor Lake's
data-dependent prefetcher. Yet. But I would not bet on there not
being a way to exploit this.
Apple's designers at least have the excuse that at the time when they
laid the groundwork for the M1, Spectre was not known, and when it
became known, it was too late to eliminate this prefetcher from the
design (but not too late to disable it through a chicken bit).
So, is there a way to fix this while maintaining the feature's
performance advantage?
What is the performance advantage? People who have tried to use
software prefetching have often been disappointed by the results. I
expect that a data-dependent prefetcher will usually not be
beneficial, either. There will be a few cases where it may help, but
the average performance advantage will be small.
On the GoFetch web page the researchers suggest using the chicken bit
in the cryptographic library. I would not be surprised if there was a
combination of speculation and data-dependent prefetching, or of
address-dependent prefetching and data-dependent prefetching that
allows all code (not just cryptographic code) to perform
data-dependent prefetches based on the secret data that only crypto
code accesses architecturally. But whether that's the case depends on
the hardware design; plus, if speculative accesses from other codes to
this data are possible, the data can usually be revealed through a
speculative load even in the absence of a data-dependent prefetcher
(but there may be software mitigations for that scenario that the
data-dependent prefetcher would circumvent).
The web page also mentions "input blinding". I wonder how that can be
made to work reliably. If the attacker has access to all the loaded
data (through GoFetch) and knows how the blinded data is processed,
the attacker can do everything that the crypto code can do,
e.g. create a session key. Of course, if the attacker has to
reconstruct the key from several pieces of data, the attack becomes
more difficult, but relying on it being too difficult has not been a
recipe for success in the past (e.g., before Bernstein's cache timing
attack on AES it was thought to be too difficult to exploint).
So what other solutions might there be? The results of the
data-dependent prefetches could be loaded into a special cache so that
they don't evict entries of other caches. If a load architecturally
accesses this data, it is transferred to the regular cache. That
special cache should be fully associative, to avoid revealing bits of
the addresse of other accesses (i.e., data) through the accessed set.
That leaves the possibility of revealing something by evicting
something from this special cache just based on capacity, but I don't
see how that could be exploited.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>