Sujet : Re: Cost of handling misaligned access
De : terje.mathisen (at) *nospam* tmsw.no (Terje Mathisen)
Groupes : comp.archDate : 08. Feb 2025, 13:36:29
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vo7j4f$20te$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.20
MitchAlsup1 wrote:
On Fri, 7 Feb 2025 15:04:23 +0000, Michael S wrote:
res += _mm256_extract_epi32(res0, 0);
res += _mm256_extract_epi32(res0, 4);
return res;
Simple question:: how would you port this code to a machine
with a different SIMD instruction set ??
Years ago I solved this problem for an optimized Ogg Vorbis decoder:
I wrote a set of #defines which wrapped MMX/SSE intrinsics on the x86 side and Motorola's more capable Altivec instructions on the Apple side.
I had to limit myself a tiny bit in a couple of places, as well as expanding a Motorola operation into a pair of SSE instrinsics, but the resulting code still ran faster than all commercialy available libraries on both platforms.
If/when those instrinsics diverge more, then the problem would be significantly harder, but back then both Altivec and SSE used 128-bit registers.
Terje
-- - <Terje.Mathisen at tmsw.no>"almost all programming can be viewed as an exercise in caching"
Haut de la page
Les messages affichés proviennent d'usenet.
NewsPortal