Sujet : Re: Computer architects leaving Intel...
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.archDate : 09. Sep 2024, 17:02:51
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2024Sep9.170251@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13
User-Agent : xrn 10.11
Michael S <
already5chosen@yahoo.com> writes:
On Mon, 09 Sep 2024 12:28:13 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>
But when changing the length to 63:
...
An interesting question is which code I want in this case.
In absence of -march options and with -O1|2|3 I want something like
that:
>
foo2:
movups (%rsi), %xmm0
movups 16(%rsi), %xmm1
movups 32(%rsi), %xmm2
movups 47(%rsi), %xmm3
movups %xmm0, (%rsi)
movups %xmm1, 16(%rsi)
movups %xmm2, 32(%rsi)
movups %xmm3, 47(%rsi)
ret
Yes.
Without deep thinking I don't see why I would want anything
different for foo1().
I don't think that deep thinking helps here. One could try to measure
microbenchmarks, but do they actually represent application use?
Given that the code is inlined, you can reduce register pressure (and
potential spilling and refilling cost) with:
foo1:
movups (%rsi), %xmm0
movups %xmm0, (%rsi)
movups 16(%rsi), %xmm0
movups %xmm0, 16(%rsi)
movups 32(%rsi), %xmm0
movups %xmm0, 32(%rsi)
movups 47(%rsi), %xmm0
movups %xmm0, 47(%rsi)
Interestingly, gcc uses this kind of scheduling, but different
register names, squandering that advantage of its scheduling. But I
did not test that in a situation where register pressure plays a role.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>