Sujet : Re: Cost of handling misaligned access
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.archDate : 03. Feb 2025, 09:34:13
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2025Feb3.093413@mips.complang.tuwien.ac.at>
References : 1 2 3 4
User-Agent : xrn 10.11
BGB <
cr88192@gmail.com> writes:
On 2/3/2025 12:55 AM, Anton Ertl wrote:
Rather, have something like an explicit "__unaligned" keyword or
similar, and then use the runtime call for these pointers.
There are people who think that it is ok to compile *p to anything if
p is not aligned, even on architectures that support unaligned
accesses. At least one of those people recommended the use of
memcpy(..., ..., sizeof(...)). Let's see what gcc produces on
rv64gc (where unaligned accesses are guaranteed to work):
[fedora-starfive:/tmp:111378] cat x.c
#include <string.h>
long uload(long *p)
{
long x;
memcpy(&x,p,sizeof(long));
return x;
}
[fedora-starfive:/tmp:111379] gcc -O -S x.c
[fedora-starfive:/tmp:111380] cat x.s
.file "x.c"
.option nopic
.text
.align 1
.globl uload
.type uload, @function
uload:
addi sp,sp,-16
lbu t1,0(a0)
lbu a7,1(a0)
lbu a6,2(a0)
lbu a1,3(a0)
lbu a2,4(a0)
lbu a3,5(a0)
lbu a4,6(a0)
lbu a5,7(a0)
sb t1,8(sp)
sb a7,9(sp)
sb a6,10(sp)
sb a1,11(sp)
sb a2,12(sp)
sb a3,13(sp)
sb a4,14(sp)
sb a5,15(sp)
ld a0,8(sp)
addi sp,sp,16
jr ra
.size uload, .-uload
.ident "GCC: (GNU) 10.3.1 20210422 (Red Hat 10.3.1-1)"
.section .note.GNU-stack,"",@progbits
Oh boy. Godbolt tells me that gcc-14.2.0 still does it the same way,
whereas clang 9.0.0 and following produce
[fedora-starfive:/tmp:111383] clang -O -S x.c
[fedora-starfive:/tmp:111384] cat x.s
.text
.attribute 4, 16
.attribute 5, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0"
.file "x.c"
.globl uload # -- Begin function uload
.p2align 1
.type uload,@function
uload: # @uload
.cfi_startproc
# %bb.0:
ld a0, 0(a0)
ret
.Lfunc_end0:
.size uload, .Lfunc_end0-uload
.cfi_endproc
# -- End function
.ident "clang version 11.0.0 (Fedora 11.0.0-2.0.riscv64.fc33)"
.section ".note.GNU-stack","",@progbits
.addrsig
If that is frequently used for unaligned p, this will be slow on the
U74 and P550. Maybe SiFive should get around to implementing
unaligned accesses more efficiently.
Though "memcpy()" is usually a "simple to fix up" scenario.
General memcpy where both operands may be unaligned in different ways
is not particularly simple. This also shows up in the fact that Intel
and AMD have failed to make REP MOVSB faster than software approaches
for many cases when I last looked. Supposedly Intel has had another
go at it, I should measure it again.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>