Newsportal USENET - Re: unaligned load/store

Re: unaligned load/store

Sujet : Re: unaligned load/store
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.arch
Date : 22. Dec 2024, 02:27:27

Autres entêtes

Organisation : Rocksolid Light
Message-ID : <cadda24092db49d26e62096c589fbf9c@www.novabbs.org>
References : 1 2 3 4 5 6 7 8
User-Agent : Rocksolid Light

On Sat, 21 Dec 2024 23:22:35 +0000, Jonathan Thornburg wrote:

MitchAlsup1 <mitchalsup@aol.com> wrote:
FORTRAN COMMON blocks require misaligned accesses to double precision
data.
R E Q U I R E in that it is neither optional nor wise to emulate with
exceptions. It is just barely tolerable using LD/ST Left/Right
instructions
out of the compiler.
>
I, personally, went through enough PAIN with misalignment, that over
time my mood swung from "aligned only" to "completely misaligned"::
a) because there is no performant* SW workaround
b) it is SO easy to fix in HW.
c) once fixed in HW, any SW burden is so small as to be barely
..measurable.
>
I'm not so sure (b) is true. Some cases are moderately easy to handle
in hardware (e.g., misaligned loads that stay within a single L1 D-cache
line), but some cases are harder (e.g., misaligned writes that cross L1
D-cache line boundaries) and might need a microcode trap (awkward if the
design wasn't otherwise using microcode). And some cases are even
harder

While there is no concept of Millicode or Microcode::
There are several sequencing components::
a) determining if the access is misaligned:: This takes 8 gates and 2
gates
of delay from an adder already comprising 2000 gates. The misaligned
assertion
comes 5-6 gates BEFORE the higher order 32-bits come out of the adder::
I consider this part ignorable.
b) accessing the cache optimally in the presence of misaligned accesses.
b.1) if the access does not cross a cache port boundary, then all the
problems are confined to the alignment of the data.
b.2) if the access crosses a port boundary but not a line boundary,
access 2 successive ports, and allow Aligner to sort out the problem.
b.3) if the access crosses a line boundary but not a page boundary,
access 2 successive ports incrementing the line address of the second
port.
b.4) if the access crosses a page boundary, you are going to have to
access the cache twice, once for the first page, once for the second.
So, only page crossing REQUIRES 2 accesses; and 99% (Made up number)
are performed in a single cycle. {{Try that with some kind of SW
workaround}}
{{Oh, and BTW; this is a good place to check that the access rights
to both pages are compatible with the rights in both PTEs.}}
So,
AGEN adder is 8 gates bigger out of 2,000 total gates
Cache port control logic is 2× as big out of 90 gates
Cache staging flip-flops in stage ALIGN is 2× as big
LD Aligner is bigger ~1.75×
Tag, TLB, DATA RAMs are exactly the same size and ~9× larger
....than of the other cache pipeline logic area
{{And you add 25-odd gates in the Miss Buffers}}
x86 has been doing this for 3 decades. It is well worn logic at this
point.
It was at AMD where I saw how easy this was for HW to simply "make the
problem disappear" compared to all the ways SW uses to work around "not
being able to access misaligned data". Once you have done it once, you
have the logic and test vectors to insure you don't shoot yourself in
the foot.
Any competent programmer will ALIGN his data to the extend possible
there is no reason to penalize {Compiler, assembler, linker, ld.so,...}
just because you want to take 5 days out of design.
So, My design:
Aligned data is always best, Misaligned data comes at very low cost.
SW overhead = 0
Your design:
Aligned data works just fine, Misaligned data is a complete nightmare
throughout the entire SW stack, and causes large uncertainty in result
deliver time. SW overhead = significant.
How many days of SW development are required to make up for the 5 days
of HW design to simply eradicate the problem.
You would not buy a car without anti-lock brakes--even though you will
only use the feature once or twice in your ownership of the vehicle !?!
Why would you buy a CPU that is not similar?

(e.g., misaligned writes crossing L1 D-cache line boundaries where the
two lines are owned by different CPUs in a cache-coherent
multiprocessor)
and might need a millicode trap. And some cases may require going all
the
way up to the OS (e.g., misaligned writes that cross virtual-memory-page
boundaries where one page is ok but the other is non-resident).

Millicode is so DEC ALPHA. Fixing the problem in HW does not require
anything but the 5 sequences I illustrated above--this amount of
sequencers are invisible in the cache pipeline as a whole.

So, allowing this in the architecture has several costs:
* extra hardware implementation effort to make sure the "hardware" cases
don't cost an extra gate delay or two on some critical path

AMD had done all of this by 1997. {don't know about when Intel had it
licked}
But, yes, if you have a balls-to-the-wall pipeline (R2000) adding a gate
of
delay would degrade performance by ~5%. This has only been shown to be
an
issue when the cache pipeline is 2 stages and one is trying to get::
Forward->AGEN->RAMS->ALIGN->resultbus in 2 cycles.
MIPS had to use direct mapped caches to meet this timing, and had to
sample SRAM chips on its own test head to measure if the SRAMs had
pin timings appropriate to R2000 timings.
Once you have set-associativity or allow for 3 cycles {note current
Intel cores are using 5 cycles.} your argument fails.
While your argument might succeed in 2µ-through-90nm, wires have become
so slow that in many cases adding a gat of pure delay does not slow
anything because the cache pipeline has been engineered O F F the/any
critical path. So, while RISC-V persists with the 2 cycle cache pipe-
line, the big boys have migrated to longer pipelines and build execution
windows to absorb the added latencies.

* extra complexity and debugging time in hardware and in system software
(think about writing and *debugging* and *verifying*
microcode/millicode
trap handlers for all those messy write-crossing-cache/page-boundary
cases, especially their interactions with multiprocessor cache
coherency)

There is N O M I L L I C O D E. There is a sequencer that can take 1
of
5 paths over the AGEN-CACHE-ALIGN stages of the pipeline. SW has to do
nothing to enable this, or overcome poor/bad use of ISA.

* this extra effort means a longer design time and/or greater design
cost,
and hence (so long as the state-of-the-art of competing systems is
still
steadily improving with time) that means a net lower price/performance
relative to competing systems

So does IEEE 754 floating point !! It is significantly more logic
intensive
than IBM or CRAY or Univac floating points. Yet, currently, some larger
cores contain 4-8 of these floating point units (8-16 is you separate
FMUL/FDIV from FADD/FSUB/FCMP).

And, because of the traps

There are N O T R A P S, no exceptions (other than expected), no
interrupt dependencies, no mispredict repair dependencies, no
coherence dependencies, ...

and their overheads (which will likely differ
significantly across different implementations of the same architecture,
e.g., different multiprocessor cache-coherency protocols), any code that
actually *uses* unaligned accesses -- especially unaligned writes --
isn't
performance-portable unless the actual dynamic frequency of unaligned
operations is very low.

UnTrue.

So yes, allowing unaligned access does help "dusty deck" Fortran code...
but it comes at a significant cost.

Less than 0.1% is a significant cost ?!?!?

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
28 Nov 24	What is an N-bit machine?	107	John Dallman
28 Nov 24	Re: What is an N-bit machine?	83	Michael S
30 Nov 24	Re: What is an N-bit machine?	82	John Levine
30 Nov 24	Re: What is an N-bit machine?	11	Stephen Fuld
30 Nov 24	Re: What is an N-bit machine?	10	John Levine
30 Nov 24	Re: What is an N-bit machine?	2	Michael S
1 Dec 24	Re: What is an N-bit machine?	1	Paul A. Clayton
30 Nov 24	Re: What is an N-bit machine?	7	MitchAlsup1
1 Dec 24	Re: What is an N-bit machine?	2	Thomas Koenig
1 Dec 24	Re: What is an N-bit machine?	1	Anton Ertl
2 Dec 24	Re: What is an N-bit machine?	4	Terje Mathisen
3 Dec 24	Re: What is an N-bit machine?	2	Brian G. Lucas
3 Dec 24	Re: What is an N-bit machine?	1	Stephen Fuld
15 Dec 24	Re: What is an N-bit machine?	1	Waldek Hebisch
30 Nov 24	Keeping other stuff with addresses (was: What is an N-bit machine?)	70	Anton Ertl
30 Nov 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	32	Anton Ertl
30 Nov 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	31	Thomas Koenig
30 Nov 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	30	Anton Ertl
30 Nov 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	29	Michael S
30 Nov 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	28	Anton Ertl
30 Nov 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	22	Michael S
1 Dec 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	21	Anton Ertl
1 Dec 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	20	Thomas Koenig
1 Dec 24	Re: Keeping other stuff with addresses	4	David Schultz
1 Dec 24	Re: Keeping other stuff with addresses	3	Thomas Koenig
4 Dec 24	Re: Keeping other stuff with addresses	2	MitchAlsup1
4 Dec 24	Re: Keeping other stuff with addresses	1	John Levine
1 Dec 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	15	Tim Rentsch
1 Dec 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	12	Thomas Koenig
1 Dec 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	1	Michael S
1 Dec 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	1	Brett
31 Dec 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	9	Tim Rentsch
1 Jan 25	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	8	Thomas Koenig
2 Jan 25	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	7	Tim Rentsch
2 Jan 25	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	6	Thomas Koenig
8 Jan 25	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	5	Tim Rentsch
8 Jan 25	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	4	Thomas Koenig
28 Jan 25	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	3	Tim Rentsch
28 Jan 25	Re: Keeping other stuff with addresses	2	David Brown
2 Feb 25	Re: Keeping other stuff with addresses	1	Thomas Koenig
2 Dec 24	Re: Keeping other stuff with addresses	2	Terje Mathisen
31 Dec 24	Re: Keeping other stuff with addresses	1	Tim Rentsch
30 Nov 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	3	John Levine
1 Dec 24	What is an N-bit machine?	2	Anton Ertl
16 Dec 24	Re: What is an N-bit machine?	1	Waldek Hebisch
30 Nov 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	2	Thomas Koenig
1 Dec 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	1	Anton Ertl
1 Dec 24	Re: Keeping other stuff with addresses (was: What is an N-bit machine?)	1	Anton Ertl
2 Dec 24	Re: Keeping other stuff with addresses	34	Chris M. Thomasson
2 Dec 24	Re: Keeping other stuff with addresses	33	MitchAlsup1
2 Dec 24	Re: Keeping other stuff with addresses	32	Chris M. Thomasson
3 Dec 24	Re: Keeping other stuff with addresses	31	Chris M. Thomasson
3 Dec 24	Re: Keeping other stuff with addresses	30	Stefan Monnier
3 Dec 24	Re: Keeping other stuff with addresses	27	John Levine
3 Dec 24	Re: Keeping other stuff with addresses	24	Stefan Monnier
4 Dec 24	Re: Keeping other stuff with addresses	23	John Levine
4 Dec 24	Re: Keeping other stuff with addresses	22	Stefan Monnier
4 Dec 24	Re: Keeping other stuff with addresses	19	MitchAlsup1
4 Dec 24	Re: Keeping other stuff with addresses	1	Thomas Koenig
4 Dec 24	Re: Keeping other stuff with addresses	3	Stefan Monnier
4 Dec 24	Re: Keeping other stuff with addresses	1	MitchAlsup1
5 Dec 24	Re: Keeping other stuff with addresses	1	Keith Thompson
5 Dec 24	Re: bytes, Keeping other stuff with addresses	1	John Levine
22 Dec 24	unaligned load/store (was: Re: Keeping other stuff with addresses)	13	Jonathan Thornburg
22 Dec 24	Re: unaligned load/store	11	MitchAlsup1
22 Dec 24	Re: unaligned load/store	4	Thomas Koenig
22 Dec 24	Re: unaligned load/store	1	Anton Ertl
22 Dec 24	Re: unaligned load/store	1	John Dallman
23 Dec 24	Re: unaligned load/store	1	Chris M. Thomasson
22 Dec 24	Re: unaligned load/store	6	Thomas Koenig
23 Dec 24	Re: unaligned load/store	1	MitchAlsup1
26 Dec 24	Re: unaligned load/store	4	Stefan Monnier
26 Dec 24	Re: unaligned load/store	1	George Neuner
26 Dec 24	Re: unaligned load/store	2	MitchAlsup1
26 Dec 24	Re: unaligned load/store	1	Chris M. Thomasson
22 Dec 24	Re: unaligned load/store (was: Re: Keeping other stuff with addresses)	1	Anton Ertl
4 Dec 24	Re: bits and bytes, Keeping other stuff with addresses	2	John Levine
4 Dec 24	Re: bits and bytes, Keeping other stuff with addresses	1	Stefan Monnier
3 Dec 24	Re: Keeping other stuff with addresses	2	MitchAlsup1
4 Dec 24	Re: Keeping other stuff with addresses	1	Chris M. Thomasson
4 Dec 24	Re: Keeping other stuff with addresses	2	Chris M. Thomasson
4 Dec 24	Re: Keeping other stuff with addresses	1	Stefan Monnier
4 Dec 24	Re: Keeping other stuff with addresses	2	Keith Thompson
4 Dec 24	Re: Keeping other stuff with addresses	1	MitchAlsup1
28 Nov 24	Re: What is an N-bit machine?	18	Thomas Koenig
28 Nov 24	Re: What is an N-bit machine?	2	MitchAlsup1
28 Nov 24	Re: What is an N-bit machine?	1	Brett
28 Nov 24	Re: What is an N-bit machine?	15	Lawrence D'Oliveiro
28 Nov 24	Re: What is an N-bit machine?	14	John Dallman
28 Nov 24	Re: What is an N-bit machine?	9	Lynn Wheeler
28 Nov 24	Re: What is an N-bit machine?	1	John Dallman
29 Nov 24	IBM and Amdahl history (Re: What is an N-bit machine?)	7	Anton Ertl
29 Nov 24	Re: IBM and Amdahl history (Re: What is an N-bit machine?)	1	Lynn Wheeler
29 Nov 24	Re: IBM and Amdahl history (Re: What is an N-bit machine?)	1	Lynn Wheeler
29 Nov 24	Re: IBM and Amdahl history (Re: What is an N-bit machine?)	4	Lawrence D'Oliveiro
30 Nov 24	Re: IBM and Amdahl history (Re: What is an N-bit machine?)	3	Anton Ertl
30 Nov 24	Re: IBM and Amdahl history (Re: What is an N-bit machine?)	1	John Dallman
1 Dec 24	Re: IBM and Amdahl history (Re: What is an N-bit machine?)	1	Lawrence D'Oliveiro
29 Nov 24	Re: What is an N-bit machine?	4	Lawrence D'Oliveiro
30 Nov 24	Re: What is an N-bit machine?	1	Brett
30 Nov 24	Re: market power, What is an N-bit machine?	1	John Levine
30 Nov 24	Re: What is an N-bit machine?	1	Lynn Wheeler
28 Nov 24	Re: What is an N-bit machine?	1	MitchAlsup1
28 Nov 24	Re: What is an N-bit machine?	1	Lynn Wheeler
29 Nov 24	Re: What is an N-bit machine?	3	Anton Ertl