Sujet : Re: Cost of handling misaligned access
De : terje.mathisen (at) *nospam* tmsw.no (Terje Mathisen)
Groupes : comp.archDate : 06. Feb 2025, 13:47:56
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vo2b1u$2v33n$1@dont-email.me>
References : 1 2 3 4 5
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.20
Waldek Hebisch wrote:
EricP <ThatWouldBeTelling@thevillage.com> wrote:
>
While the Linux kernel may not use many misaligned values,
I'd guess there is a lot of application code that does.
I guess that much of that is simply "by accident" because
without alignment checks in hadware misalignemnt may happen
and nobody notices that there is small performance problem.
I worked on a low level program and reasonably recent I did get
bunch of alignment errors. On AMD64 they were due to SSE
instructions used by 'memcpy', on 32-bit ARM due to use of double
precision floating point in 'memcpy'. It took some time to find
them, simply most things worked even without alignment and the
offending cases were hard to trigger.
My personal feeling is that best machine would have aligned
access with checks by default, but also special instructions
for unaligned access. That way code that does not need
unaligned access gets extra error checking, while code that
uses unaligned access pays modest, essentially unavoidable
penalty.
Of course, once architecture officially supports unaligned
access, there will be binaries depending on this and backward
compatibility will prevent change to require alignment.
Concerning SIMD: trouble here is increasing vector length and
consequently increasing alignment requirements. A lot of SIMD
code is memory-bound and current way of doing misaligned
access leads to worse performance. So really no good way
to solve this. In principle set of buffers for 2 cache lines
each and appropriate shifters could give optimal troughput,
but probably would lead to increased latency.
SIMD absolutely require, as a minimum, the ability to handle data that is only aligned according to the internal elements: An array of double can start on any address which is 0 mod 8, similar for float/u32 etc. This way you can go from 128 via 256 to 512 bit SIMD regs with no data alignment change.
From this, and the need to also handle byte arrays, you end up with unaligned as the default. The less overhead to handle straddling inputs the better.
Terje
-- - <Terje.Mathisen at tmsw.no>"almost all programming can be viewed as an exercise in caching"
Haut de la page
Les messages affichés proviennent d'usenet.
NewsPortal