Sujet : Re: Instruction Tracing
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 12. Aug 2024, 18:32:09
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <9ceec7cd590edecc5c90a27aadeaf388@www.novabbs.org>
References : 1 2 3 4 5 6 7 8 9 10
User-Agent : Rocksolid Light
On Mon, 12 Aug 2024 15:14:53 +0000, Michael S wrote:
On Mon, 12 Aug 2024 08:42:51 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
>
On Mon, 12 Aug 2024 11:09:18 +0300, Michael S wrote:
>
On Mon, 12 Aug 2024 06:33:17 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
>
But in spite of having, say, 2½ times the clock speed of POWER,
Alpha was not 2½ times faster, was it?
>
Of course not.
>
That’s what I mean: it took several clock cycles per instruction,
contrary to just about every other RISC architecture.
>
On EV4 simple ALU instructions took 1 cycle , both for throughput and
for latency.
Shifts and conditional moves had latency of 2, throughput of 1.
Integer multiplier was not pipelined, but few RISC also had it
none-pipelined.
Mc88100 had a pipelined multiplier, you could start a int mul
every cycle or a single mul evey cycle or a double mul every 4
cycles.
Latency of integer multiplier was 19-21 cycles.
3 cycles for Mc88100
On FP side both FADD and FMUL were fully pipelined (T=1) and had
latency of 6 cycles.
L1D cache hits were fully pipelined (T=1) and had latency of 3 cycles.
>
So, as long as code/data was fitting in L1 cache, EV4 IPC was not
far behind competition. Relatively to MIPS R4K, may be, even ahead.
>
Of course, cache misses were relatively more expensive than for much
lower clocked competitors. DEC's solution to that was wide and fast
system bus.