Liste des Groupes | Revenir à c arch |
On 10/3/24 10:00, Anton Ertl wrote:Probably not feasible. The polling frequency wouldn't be high enough.Two weeks ago Rene Mueller presented the paper "The Cost of ProfilingFor profiling, do we really need accurate counters? They just need to
in the HotSpot Virtual Machine" at MPLR 2024. He reported that for
some programs the counters used for profiling the program result in
cache contention due to true or false sharing among threads.
>
The traditional software mitigation for that problem is to split the
counters into per-thread or per-core instances. But for heavily
multi-threaded programs running on machines with many cores the cost
of this mitigation is substantial.
>
be statistically accurate I would think.
Instead of incrementing a counter, just store a non-zero immediate into
a zero initialized byte array at a per "counter" index. There's no
rmw data dependency, just a store so should have little impact on
pipeline.
A profiling thread loops thru the byte array, incrementing an actual
counter when it sees no zero byte, and resets the byte to zero. You
could use vector ops to process the array.
If the stores were fast enough, you could do 2 or more stores at
hashed indices, different hash for each store. Sort of a counting
Bloom filter. The effective count would be the minimum of the
hashed counts.
No idea how feasible this would be though.
Les messages affichés proviennent d'usenet.