EricP <
ThatWouldBeTelling@thevillage.com> writes:
Anton Ertl wrote:
There are lots of potentially unaligned loads and stores. There are
very few actually unaligned loads and stores: On Linux-Alpha every
unaligned access is logged by default, and the number of
unaligned-access entries in the logs of our machines was relatively
small (on average a few per day). So trapping actual unaligned
accesses was faster than replacing potential unaligned accesses with
code sequences that synthesize the unaligned access from aligned
accesses.
Of course, if the cost of unaligned accesses is that high, you will
avoid them in cases like block copies where cheap unaligned accesses
would otherwise be beneficial.
- anton
>
That is fine for code that is being actively maintained and backward
data structure compatibility is not required (like those inside a kernel).
That is the experience on Linux-Alpha, which ran user-level code which
had, for the most part, already been ported to, e.g., SPARC with
trapping on actual unaligned access. These days, with basically all
available hardware of the last decade supporting unaligned accesses,
the experience might be different.
However for x86 there was a few billion lines of legacy code that likely
assumed 2-byte alignment, or followed the fp64 aligned to 32-bits advice,
That's not advice, that's the Intel IA-32 ABI. If you lay out your
structures differently, they will not work with the libraries.
and a C language that mandates structs be laid out in memory exactly as
specified (no automatic struct optimization).
The C language mandates that the order of the fields is as specified,
and that the same sequence of field types leads to the same layout,
but otherwise does not mandate a layout. In particular, competent
ABIs (i.e., not Intel's IA-32 ABI) mandate layouts that result in
natural alignment of basic types.
Also I seem to recall some
amount of squawking about SIMD when it required naturally aligned buffers.
SSE does not require natural alignment wrt. basic types, but the
load-and-op instructions require 16-byte alignment. That's another
idiocy on Intel's part. If you have
for (i=0; i<n; i++)
a[i] = b[i] + c[i];
that's easy to vectorize if you have support for basic-type-aligned or
unaligned accesses. But a, b, and c may all have different start
addresses mod 16, so you cannot use Intel's 16-byte-aligned memory
accesses for vectorizing that. Fortunately, they were not completely
stupid and included unaligned-load and unaligned-store instructions,
so if you use those, and forget about the load-and-operate
instructions, SSE is useable.
AMD has added a flag that turns off this Intel stupidity (if the flag
is set, all SSE memory accesses support unaligned accesses), but Intel
is stubborn and does not support this flag to this day; and they are
the manufacturer that sells CPUs without AVX/AVX2 to this day (unlike
AMD, which has supported AVX2 on all CPUs they sell for a long time).
As SIMD no longer requires alignment, presumably code no longer does so.
Yes, if you use AVX/AVX2, you don't encounter this particular Intel
stupidity.
Also in going from 32 to 64 bits, data structures that contain pointers
now could find those 8-byte pointers aligned on 4-byte boundaries.
What you write does not make sense. RAM data structures are laid out
according to the ABI, which is different for different architectures,
and typically requires natural alignment for basic data types; no
unaligned accesses from backwards compatibility here. Wire or on-disk
data structures are laid out according to the specification of the
protocol or file system, which may include basic data types that are
not aligned according to natural alignment (e.g., because there is a
prefix on the wire); these do not contain pointers, and even if they
contain some kind of reference (e.g., block numbers or inode numbers),
the sizes are fixed across architectures.
While the Linux kernel may not use many misaligned values,
I'd guess there is a lot of application code that does.
The reports about unaligned accesses in the logs were associated with
user-level code (I dimly remember gs occuring in the log), not kernel
code.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>