Sujet : Performance monitoring (was: Efficiency of in-order vs. OoO)
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.archDate : 26. Mar 2024, 17:47:02
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2024Mar26.174702@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5
User-Agent : xrn 10.11
scott@slp53.sl.home (Scott Lurndal) writes:
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
scott@slp53.sl.home (Scott Lurndal) writes:
The biggest demand is from the OS vendors. Hardware folks have
simulation and emulators.
>
You don't want to use a full-blown microarchitectural emulator for a
long-running program.
>
Generally hardware folks don't run 'long-running programs' when
analyzing performance, they use the emulator for determining latencies,
bandwidths and efficiacy of cache coherency algorithms and
cache prefetchers.
>
Their target is not application analysis.
This sounds like hardware folks that are only concerned with
memory-bound programs.
I OTOH expect that designers of out-of-order (and in-order) cores
analyse the performance of various programs to find out where the
bottlenecks of their microarchitectures are in benchmarks and
applications that people look at to determine which CPU to buy. And
that's why we not only just have PMCs for memory accesses, but also
for branch prediction accuracy, functional unit utilization, scheduler
utilization, etc.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>