Sujet : Re: Why VAX Was the Ultimate CISC and Not RISC
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.archDate : 02. Mar 2025, 00:45:35
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vq0677$dq4s$2@dont-email.me>
References : 1 2 3
User-Agent : Mozilla Thunderbird
On 3/1/2025 5:19 PM, Lawrence D'Oliveiro wrote:
On Sat, 01 Mar 2025 14:40:55 -0500, EricP wrote:
If you look at the VAX 8800 or NVAX uArch you see that even in 1990 it
was still taking multiple clocks to serially decode each instruction and
that basically stalls away any benefits a pipeline might have given.
How many clocks did Alpha take to process each instruction? Because I
recall the initial chips had clock speeds several times that of the RISC
competition, but performance, while competitive, was not several times
greater.
AFAIK, issue was partly a case of:
What clock speed giveth, large instruction counts taketh away...
If reading an 8/16/32 bit value from memory takes a multi-instruction sequence, and writing such a value back to memory also requires another multi-instruction sequence (to fake the insertion in software).
Well, this can't be good for performance...
Seemingly, it was designed to prioritize clock speed above all else, but more instructions don't necessarily help if the added overheads exceed the relative gains in clock speed.
Such is the issue. On the other side, one may get better performance at a lower clock speed if one needs fewer instructions for a given task, and those instructions have a lower average case latency.
Though, admittedly, in my project, the typical 2/3 cycle latency values on most instructions are higher than 1/2 cycle latency on many other CPUs. But, at the same time, the latency hurts less if one has fewer and shorter chains of directly-dependent instructions.
Not a great situation for running RISC-V though, which has a more obvious detrimental impact from things like 2-cycle ADD or SLL (mostly as direct multi-instruction register-dependent chains are common and GCC emits a lot of them).
Not sure about what instruction scheduling was like on the Alpha, this would likely have a big impact on how it would perform in the face of multi-cycle ALU ops (and, off hand, I don't know if the Alpha had single or multi-cycle latency on ALU ops). Though, AFAIK, 2-cycle ALU was common on PowerPC and POWER variants.
...