Sujet : Re: Instruction Tracing
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 10. Aug 2024, 19:25:39
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <68fb6b1949a23186ef7d6f9d9f65d02c@www.novabbs.org>
References : 1
User-Agent : Rocksolid Light
On Sat, 10 Aug 2024 6:20:51 +0000, Lawrence D'Oliveiro wrote:
In the early days of the spread of RISC (i.e. the 1980s), much was made
of
the analysis of the dynamic execution profiles of actual compiled
programs
to see what machine instructions they most frequently used. This then
became the rationale for optimizing common instructions, and even
omitting
ones that were not so often used.
>
One thing these instruction traces would frequently report is that
integer
multiply and divide instructions were not so common, and so could be
omitted and emulated in software, with minimal impact on overall
performance. We saw this design decision taken in the early versions of
Sun’s SPARC for example, and also IBM’s ROMP as used in the RT PC.
One of the reasons I like unified register files is that one HAS TO
implement FP MUL and if one has FMUL then one has easy access to IMUL.
Same for FDIV. FCMP is only 12 gates different than ICMP, and one needs
not consume OpCode space with FP LDs and STs.
{{I give MIPS (the company) a pass, here, because their FPU was
on a different chip than the integer stuff.}}
Later, it seems, the CPU designers realized that instruction traces were
not the final word on performance measurements, and started to include
hardware integer multiply and divide instructions.
AMD had a HW trace unit which would spew instruction and data addresses
and branch directions to ½ of main memory. This would trace across
user<->OS boundaries so simulation could include everything the chip
was doing--including the damage OS excursions did to caches and TLBs.
While there I extended this to the interconnect and DRAM Bank control.
We had over 1,000 files, 4GB each using about 12-bits per average
instruction capturing SPECint, SPECfp, database, TCPIP, server
workloads,... Using the "server farm" we could run all of them
"overnight".
What we can all agree upon is the user instruction tracing is
insufficient
in capturing what the chip will be doing, but that capturing all the
instructions across all the privilege levels is.
(ROMP was also one of those RISC architectures that had delayed
branches,
along with MIPS, HP-PA and I think SPARC as well.)
Everybody makes mistakes.
>
I have heard it said that the RT PC was a poor advertisement for the
benefits of RISC, and the joke was made that “RT” stood for “Reduced
Technology”.
People who make mistakes are laughed at--and deservedly so.
>
Later, of course, IBM more than made good this deficiency with its
second
take on RISC, in the form of the POWER architecture, which is still a
performance leader to this day.
Think about how much more market share POWER would have if they had
not crippled it the first go around ?!?!!