quadibloc <
quadibloc@gmail.com> writes:
Eventually, IBM caught up with the Control
Data 6600 by perfecting pipelining in the IBM 360/91, and then combining
it with cache in the 360/195. From the Pentium II onwards, that's the
way computers are made nowadays.
Pipelining and caches are already used on the MIPS R2000 in 1986, and
the 486 in 1989.
You are probably thinking of OoO Execution, where people usually write
as if the Tomasulo algorithm of the 360/91 as implemented the modern
concept of OoO execution. But the 360/91 only did OoO for FP, did not
support branch prediction, had imprecise exceptions, and the Tomasulo
algorithm was used primarily as a workaround for the dearth of FP
registers in the S/360.
The innovation that made OoO execution generally usable rather than a
publicity stunt like the 360/91 is the reorder buffer (ROB), which allows to
retire the instructions in-order, and to cancel speculatively
"executed" instructions after an exception or branch misprediction.
The Pentium Pro (introduced 1995-11-01), HP PA-8000 (introduced
1995-11-02), and MIPS R10000 (introduced 1996-01) are the first
microprocessors which have full-blown OoO execution.
But as someone pointed out to me, IBM has implemented OoO execution
between the 370/195 and the Pentium Pro: The ES/9000 models 900 and
820 (shipping from September 1991) "were the first models with
out-of-order execution since the System/370-195 of 1973. However
unlike the old S/360-91-derived systems, the models 900 and 820 had
full out-of-order execution for both integer and floating-point units,
with precise exception handling, and a fully superscalar pipeline."
<
https://en.wikipedia.org/wiki/IBM_System/390#ES/9000>. So apparently
they had a ROB, and AFAIK were the first machines to have one. These
models also had a branch target buffer; the article does not mention
branch prediction proper, but given a ROB and a branch target buffer,
it would be surprising if they did not predict branches.
So who came up with the concept of the ROB? I recently looked at one
of the HPS papers (Hwu, Patt, Shebanov on a High Performance Substrate
for the VAX from the mid-late 80s) again, and there was no ROB in that
paper. I did not revisit their later papers whether they had it
there. So apparently ROBs were not known in the mid-1980s in
academia, and in 1991 there was hardware with a ROB commercially
available, and a few years later it appeared in microprocessors.
I wonder how early and how much IBM talked about their ES/9000 OoO
implementation and features, but that may have inspired the teams at
Intel, HP and SGI; or maybe there was an ealier source that inspired
them all, but only in 1995/1996 the number of transistors on a chip
was enough to do OoO on a microprocessor.
Ironically, in the transition to CMOS (i.e., microprocessors) IBM
mainframe processors regressed back to in-order (and, I think,
single-issue) again (but with higher clock rates), and in the early
2000s they looked pretty outdated to me. In the meantime they have
re-progressed to OoO again AFAIK.
Back to OoO: it's interesting that Tomasulo and the 360/91 are
mentioned often, but the ROB and its inventor(s?), which are at least
as important for the success of OoO execution, isn't.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>