Sujet : Re: Cost of handling misaligned access
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.archDate : 04. Feb 2025, 11:09:09
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2025Feb4.110909@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6
User-Agent : xrn 10.11
Thomas Koenig <
tkoenig@netcologne.de> writes:
Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
[fedora-starfive:/tmp:111378] cat x.c
#include <string.h>
>
long uload(long *p)
{
long x;
memcpy(&x,p,sizeof(long));
return x;
}
[fedora-starfive:/tmp:111379] gcc -O -S x.c
[fedora-starfive:/tmp:111380] cat x.s
.file "x.c"
.option nopic
.text
.align 1
.globl uload
.type uload, @function
uload:
addi sp,sp,-16
lbu t1,0(a0)
>
[...]
>
With RISC-V, nobody ever knows what architecture he is compiling for...
The compiler knew very well that it was generating code for RV64GC,
and I knew that too.
And in this particular case the exact variant does not matter. RISC-V
is generally specified to support unaligned accesses, see below.
Did you tell gcc specifically that unsigned access was supported in
the architecture you were using?
What specific option would that be, and why would I want to look it up
and tell that to gcc? After all, that's specified in the RISC-V
specification. Even a very old one
<
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-62.pdf>
says:
|The base ISA supports misaligned accesses
It continues
|but these might run extremely slowly depending on the implementation.
but that does not change the architecture (and you referred to the
architecture).
It's also true of other architectures, and not just in theory, but
also in practice <
http://al.howardknight.net/?ID=143135464800>
<
https://www.complang.tuwien.ac.at/anton/unaligned-stores/>; should I
check for such an option for every architecture?
But even assuming that I want to generate code tuned for RISC-V
implementations where unaligned accesses are implemented so slowly
that I would prefer that code containing only aligned accesses is
generated, I would expect a compiler for which the memcpy workaround
is recommended (such as gcc) to do better, much better than gcc
actually does, e.g., something along the lines of:
uload:
addi a5,a0,7
andi a4,a0,-8
andi a5,a5,-8
ld a2,0(a5)
ld a3,0(a4)
neg a4,a0
andi a4,a4,7
andi a0,a0,7
slliw a4,a4,3
slliw a5,a0,3
sll a5,a3,a5
sra a0,a2,a4
or a0,a0,a5
ret
Fewer instructions, and also a better distribution between various
functional units.
IIRC it's only three instructions on MIPS and five instructions on
Alpha, but they have special instructions for this case, because they
were designed for it, whereas RISC-V was designed to have unaligned
accesses.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>