Sujet : Re: OoO execution
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 29. May 2025, 21:06:21
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <6e113c5312a5933bd51ff549a99b6869@www.novabbs.org>
References : 1 2 3 4 5 6 7 8 9
User-Agent : Rocksolid Light
On Thu, 29 May 2025 19:02:11 +0000, Thomas Koenig wrote:
Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
quadibloc <quadibloc@gmail.com> writes:
Eventually, IBM caught up with the Control
Data 6600 by perfecting pipelining in the IBM 360/91,
At the cost of about 3× the number of gates and power along with
a 60% increase in the clock rate (60ns versus 100ns). This advantage
vanished about the time of first /91 deliveries with CDC 7600 going
to a ~27ns clock along with pipelining and concurrent calculation.
and then
combining
it with cache in the 360/195.
A last gasp for leadership in Big number crunching for IBM.
From the Pentium II onwards, that's the
way computers are made nowadays.
Once everyone can afford the gates to make pipeline staging latches
it is the natural way for design. Prior to this point, the designers
were more focused on "getting it on in a single die" than getting
the highest possible performance--often limited by the speed of
the external interface more than calculations inside.
Pipelining and caches are already used on the MIPS R2000 in 1986, and
the 486 in 1989.
>
Or the 801. That may have been the first machine to have
separate I- and D-caches (was it?)
Without disagreeing with the above::
MIPS R2000 (and R3000) had a unified cache--read twice per cycle on
clock high and clock low. R3000 was faster in writing (STs) to the
cache than R2000. Tablewalks in SW via a big hash table.
Mc68010 had a "loop buffer" of a couple handful of instructions.
Mc68020 had 256B instruction cache no TLB
Mc68030 had 256B I$ 256B D$ and ~32E TLB tablewalks in HW
Mc88100 had 16KB I$ with 64E TLB 16KB D$ with 64E TLB tablewalks
in HW.
CDC 6600 had a multi-word instruction stack 6600 and a significantly
larger instruction stack 7600 with backward branch prediction.
Base+Bounds memory protection 6600. Context switch in ~16 cycles
by writing out current state while reading in new state.
Many machines overlapped Fetch-DECODE with EXECUTE-WRITEBACK all the
way back to beginning as a 2 stage pipeline. This, alone, makes the
point where pipelining "took over" difficult to judge.