Newsportal USENET - Re: Cost of handling misaligned access

EricP <ThatWouldBeTelling@thevillage.com> writes:

The incremental cost is in a sequencer in the AGU for handling cache
line and possibly virtual page straddles, and a small byte shifter to
left shift the high order bytes. The AGU sequencer needs to know if the
line straddles a page boundary, if not then increment the 6-bit physical
line number within the 4 kB physical frame number, if yes then increment
virtual page number and TLB lookup again and access the first line.
(Slightly more if multiple page sizes are supported, but same idea.)
For a load AGU merges the low and high fragments and forwards.

...

The hardware cost appears trivial, especially within an OoO core.
So there doesn't appear to be any reason to not handle this.
Am I missing something?

The OS must also be able to keep both pages in physical memory until
the access is complete, or there will be no progress. Should not be a
problem these days, but the 48 pages or so potentially needed by VAX
complicated the OS.

Yes, hardware is not hard, there is software that benefits, and as a
result, modern architectures (including RISC-V) now support unaligned
accesses (except for atomic accesses).

https://old.chipsandcheese.com/2025/01/26/inside-sifives-p550-microarchitecture/

...

This terrible unaligned access behavior is atypical even for low power
cores. Arm's Cortex A75 only takes 15 cycles in the worst case of
dependent accesses that are both misaligned.
>
Digging deeper with performance counters reveals executing each unaligned
load instruction results in ~505 executed instructions.

This is similar to what I measured on an U74 core from SiFive
<2024May14.073553@mips.complang.tuwien.ac.at>, so they probably use
the same solution.

P550 almost
certainly doesn�t have hardware support for unaligned accesses.
Rather, it�s likely raising a fault and letting an operating system
handler emulate it in software."

The architecture guarantees that unaligned accesses work, so the OS
might not have support for such emulation. Another option would be to
trap into some kind of firmware-supplied fixup code, along the lines
of Alpha's PALcode.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Date	Sujet	#	Auteur
2 Feb 25	Re: Cost of handling misaligned access	19	Anton Ertl
2 Feb 25	Re: Cost of handling misaligned access	18	Thomas Koenig
2 Feb 25	Re: Fun with a Vax, Cost of handling misaligned access	2	John Levine
3 Feb 25	Re: Fun with a Vax, Cost of handling misaligned access	1	John Levine
3 Feb 25	Re: Cost of handling misaligned access	2	BGB
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	13	Terje Mathisen
3 Feb 25	Re: Cost of handling misaligned access	12	John Levine
3 Feb 25	Re: Cost of handling misaligned access	11	MitchAlsup1
4 Feb 25	Re: Cost of handling misaligned access	4	John Levine
4 Feb 25	Re: Cost of handling misaligned access	3	John Dallman
5 Feb 25	Re: Cost of handling misaligned access	2	Michael S
5 Feb 25	Re: Cost of handling misaligned access	1	John Dallman
4 Feb 25	Re: Cost of handling misaligned access	6	MitchAlsup1
4 Feb 25	Re: Cost of handling misaligned access	1	Stephen Fuld
4 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	3	BGB
4 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
5 Feb 25	Re: Cost of handling misaligned access	1	BGB