Liste des Groupes | Revenir à c arch |
On 6/13/2024 11:52 AM, Stefan Monnier wrote:This is a late reply, but optimal static ordering for N-wide may beAFAICT Terje was talking about scheduling for OoO CPUs, and wasn't
very non-optimal for N-1 (or N-2, etc.). As an example, assume a
perfectly
talking about the possible worst case situations, but about how things
usually turn out in practice.
For statically-scheduled or in-order CPUs, it can be indeed more
difficult to generate code that will run (almost) optimally on a
variety
of CPUs.
Yeah, you need to know the specifics of the pipeline for either optimal
machine code (in-order superscalar) or potentially to be able to run at
all (LIW / VLIW).
That said, on some OoO CPU's, such as when I was running a Piledriver based core, it did seem as if things were scheduled to assume an in-order CPU (such as putting other instructions between memory loads and the instructions using the results, etc), it did perform better (seemingly implying there are limits to the OoO magic).When doing both Mc 88120 and K9 we found lots of sequences if code
Though, OTOH, a lot of the sorts of optimization tricks I found for the
Piledriver were ineffective on the Ryzen, albeit mostly because the
more
generic stuff caught up.
For example, I had an LZ compressor that was faster than LZ4 on thatIt is the continuous nature of having to reschedule code every
CPU
(it was based around doing everything in terms of aligned 32-bit
dwords,
gaining speed at the cost of worse compression), but then when going over to the Ryzen, LZ4 got faster...
Like, seemingly all my efforts in "aggressively optimizing" some thingsI want to compile once and then use forever (in a dynamic library).
became moot simply by upgrading my PC.
....
Les messages affichés proviennent d'usenet.