Liste des Groupes | Revenir à c arch |
Brett <ggtgp@yahoo.com> writes:With full access to constants, there is even less need to promoteAnton Ertl <anton@mips.complang.tuwien.ac.at> wrote:>Brett <ggtgp@yahoo.com> writes:>The lack of CPU’s with 64 registers is what makes for a market, that 4%>
that could benefit have no options to pick from.
They had:
>
SPARC: Ok, only 32 GPRs available at a time, but more in hardware
through the Window mechanism.
>
AMD29K: IIRC a 128-register stack and 64 additional registers
>
IA-64: 128 GPRs and 128 FPRs with register stack and rotating register
files to make good use of them.
All antiques no longer available.
SPARC is still available: <https://en.wikipedia.org/wiki/SPARC> says:
>
|Fujitsu will also discontinue their SPARC production [...] end-of-sale
|in 2029, of UNIX servers and a year later for their mainframe.
>
No word of when Oracle will discontinue (or has discontinued) sales,
but both companies introduced their last SPARC CPUs in 2017.
>
In any case, my point still stands: these architectures were
available, and the large number of registers failed to give them a
decisive advantage. Maybe it even gave them a decisive disadvantage:
AMD29K and IA-64 never had OoO implementations, and SPARC got them
only with the Fujitsu SPARC64 V in 2002 and the Oracle SPARC T4 in
2011, years after Intel, MIPS, HP switched to OoO im 1995/1996 and
Power and Alpha switched in 1998 (POWER3, 21264).
>>Where is your 4% number coming from?>
The 4% number is poor memory and a guess.
Here is an antique paper on the issue:
>
https://www.eecs.umich.edu/techreports/cse/00/CSE-TR-434-00.pdf
Interesting. I only skimmed the paper, but I read a lot about
inlining and interprocedural register allocation. SPARCs register
windows and AMD29K's and IA-64's register stacks were intended to be
useful for that, but somehow the other architectures did not suffer a
big-enough disadvantage to make them adopt one of these concepts, and
that's despite register windows/stacks working even for indirect calls
(e.g., method calls in the general case), where interprocedural
register allocation or inlining don't help.
>
It seems to me that with OoO the cycle cost of spilling and refilling
on call boundaries was lowered: the spills can be delayed until the
computation is complete, and the refills can start early because the
stack pointer tends to be available early.
>
And recent OoO CPUs even have zero-cycle store-to-load forwarding, so
even if the called function is short, the spilling and refilling
around it (if any) does not increase the latency of the value that's
spilled and refilled. But that consideration is only relevant for
Intel APX, ARM A64 and RISC-V went for 32 registers several years
before zero-cycle store-to-load-forwarding was implemented.
>
One other optimization that they use the additional registers for is
"register promotion", i.e., putting values from memory into registers
for a while (if absence of aliasing can be proven). One interesting
aspect here is that register promotion with 64 or 256 registers (RP-64
and RP-256) is usually not much better (if better at all) than
register promotion with 32 registers (RP-32); see Figure 1. So
register promotion does not make a strong case for more registers,
either, at least in this paper.
>
- anton
Les messages affichés proviennent d'usenet.