Sujet : Re: Computer architects leaving Intel...
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.archDate : 12. Sep 2024, 15:20:42
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2024Sep12.162042@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13
User-Agent : xrn 10.11
Tim Rentsch <
tr.17687@z991.linuxsc.com> writes:
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>
[considering which way to copy with memmove()]
>
If the two memory blocks don't overlap, memmove() can use the
fastest stride. [...]
>
The way to go for memmove() is:
>
On hardware where positive stride is faster:
>
if (((uintptr)(dest-src)) >= len)
return memcpy_posstride(dest,src,len)
else
return memcpy_negstride(dest,src,len)
>
On hardware where the negative stride is faster:
>
if (((uintptr)(src-dest)) >= len)
return memcpy_negstride(dest,src,len)
else
return memcpy_posstride(dest,src,len)
>
And I expect that my test is undefined behaviour, but most people
except the UB advocates should understand what I mean.
...
Last but not least, having two different code blocks for the
different preferences is clunky. The two blocks can be
combined by fusing the two test expressions into a single
expression, as for example
>
#ifndef PREFER_UPWARDS
#define PREFER_UPWARDS 1
#endif/*PREFER_UPWARDS*/
>
extern void* ascending_copy( void*, const void*, size_t );
extern void* descending_copy( void*, const void*, size_t );
>
void *
good_memmove( void *vd, const void *vs, size_t n ){
const char *d = vd;
const char *s = vs;
_Bool upwards = PREFER_UPWARDS ? d-s +0ull >= n : s-d +0ull < n;
>
return
upwards
? ascending_copy( vd, vs, n )
: descending_copy( vd, vs, n );
}
>
Using the preprocessor symbol PREFER_UPWARDS to select between
the two preferences (ascending or descending) allows the choice
to made by a -D compiler option, and we can expect the compiler
to optimize away the part of the test that is never used.
That's clever, but for usage in glibc or the like the clunky version
is the preferred one: memmove() is usually called through the dynamic
linking mechanism, and which implementation is actually called is
selected based on the hardware that it runs on (what does it do when
the program is linked statically?). There seem to be quite a few
memmove() (and __memmove_chk()) implementations in glibc-2.36 on
AMD64:
__memmove_chk
__memmove_sse2_unaligned_erms
__memmove_chk
__memmove_chk_erms
__memmove_chk_evex_unaligned
__memmove_chk_avx_unaligned
__memmove_chk_ssse3
__memmove_chk_sse2_unaligned
__memmove_erms
__memmove_avx512_unaligned
__memmove_evex_unaligned
__memmove_evex_unaligned_erms
__memmove_avx_unaligned
__memmove_avx_unaligned_erms
__memmove_avx_unaligned_rtm
__memmove_ssse3
__memmove_sse2_unaligned
__memmove_chk_sse2_unaligned_erms
__memmove_chk_avx512_no_vzeroupper
__memmove_chk_avx512_unaligned
__memmove_chk_avx512_unaligned_erms
__memmove_chk_evex_unaligned_erms
__memmove_chk_avx_unaligned_erms
__memmove_chk_avx_unaligned_rtm
__memmove_chk_avx_unaligned_erms_rtm
__memmove_avx512_no_vzeroupper
__memmove_avx512_unaligned_erms
__memmove_avx_unaligned_erms_rtm
From what I read, __memmove_chk() (which has an additional destlen
parameter) is apparently not intended to be called explicitly from the
source code, so I guess that some compilers generate calls to it.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>