Re: unaligned load/store

Liste des GroupesRevenir à c arch 
Sujet : Re: unaligned load/store
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.arch
Date : 22. Dec 2024, 02:27:27
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <cadda24092db49d26e62096c589fbf9c@www.novabbs.org>
References : 1 2 3 4 5 6 7 8
User-Agent : Rocksolid Light
On Sat, 21 Dec 2024 23:22:35 +0000, Jonathan Thornburg wrote:

MitchAlsup1 <mitchalsup@aol.com> wrote:
FORTRAN COMMON blocks require misaligned accesses to double precision
data.
R E Q U I R E in that it is neither optional nor wise to emulate with
exceptions. It is just barely tolerable using LD/ST Left/Right
instructions
out of the compiler.
>
I, personally, went through enough PAIN with misalignment, that over
time my mood swung from "aligned only" to "completely misaligned"::
a) because there is no performant* SW workaround
b) it is SO easy to fix in HW.
c) once fixed in HW, any SW burden is so small as to be barely
..measurable.
>
I'm not so sure (b) is true.  Some cases are moderately easy to handle
in hardware (e.g., misaligned loads that stay within a single L1 D-cache
line), but some cases are harder (e.g., misaligned writes that cross L1
D-cache line boundaries) and might need a microcode trap (awkward if the
design wasn't otherwise using microcode).  And some cases are even
harder
While there is no concept of Millicode or Microcode::
There are several sequencing components::
a) determining if the access is misaligned:: This takes 8 gates and 2
gates
of delay from an adder already comprising 2000 gates. The misaligned
assertion
comes 5-6 gates BEFORE the higher order 32-bits come out of the adder::
I consider this part ignorable.
b) accessing the cache optimally in the presence of misaligned accesses.
b.1) if the access does not cross a cache port boundary, then all the
problems are confined to the alignment of the data.
b.2) if the access crosses a port boundary but not a line boundary,
access 2 successive ports, and allow Aligner to sort out the problem.
b.3) if the access crosses a line boundary but not a page boundary,
access 2 successive ports incrementing the line address of the second
port.
b.4) if the access crosses a page boundary, you are going to have to
access the cache twice, once for the first page, once for the second.
So, only page crossing REQUIRES 2 accesses; and 99% (Made up number)
are performed in a single cycle. {{Try that with some kind of SW
workaround}}
{{Oh, and BTW; this is a good place to check that the access rights
to both pages are compatible with the rights in both PTEs.}}
So,
AGEN adder is 8 gates bigger out of 2,000 total gates
Cache port control logic is 2× as big out of 90 gates
Cache staging flip-flops in stage ALIGN is 2× as big
LD Aligner is bigger ~1.75×
Tag, TLB, DATA RAMs are exactly the same size and ~9× larger
....than of the other cache pipeline logic area
{{And you add 25-odd gates in the Miss Buffers}}
x86 has been doing this for 3 decades. It is well worn logic at this
point.
It was at AMD where I saw how easy this was for HW to simply "make the
problem disappear" compared to all the ways SW uses to work around "not
being able to access misaligned data". Once you have done it once, you
have the logic and test vectors to insure you don't shoot yourself in
the foot.
Any competent programmer will ALIGN his data to the extend possible
there is no reason to penalize {Compiler, assembler, linker, ld.so,...}
just because you want to take 5 days out of design.
So, My design:
Aligned data is always best, Misaligned data comes at very low cost.
SW overhead = 0
Your design:
Aligned data works just fine, Misaligned data is a complete nightmare
throughout the entire SW stack, and causes large uncertainty in result
deliver time. SW overhead = significant.
How many days of SW development are required to make up for the 5 days
of HW design to simply eradicate the problem.
You would not buy a car without anti-lock brakes--even though you will
only use the feature once or twice in your ownership of the vehicle !?!
Why would you buy a CPU that is not similar?

(e.g., misaligned writes crossing L1 D-cache line boundaries where the
two lines are owned by different CPUs in a cache-coherent
multiprocessor)
and might need a millicode trap.  And some cases may require going all
the
way up to the OS (e.g., misaligned writes that cross virtual-memory-page
boundaries where one page is ok but the other is non-resident).
Millicode is so DEC ALPHA. Fixing the problem in HW does not require
anything but the 5 sequences I illustrated above--this amount of
sequencers are invisible in the cache pipeline as a whole.

So, allowing this in the architecture has several costs:
* extra hardware implementation effort to make sure the "hardware" cases
  don't cost an extra gate delay or two on some critical path
AMD had done all of this by 1997. {don't know about when Intel had it
licked}
But, yes, if you have a balls-to-the-wall pipeline (R2000) adding a gate
of
delay would degrade performance by ~5%. This has only been shown to be
an
issue when the cache pipeline is 2 stages and one is trying to get::
     Forward->AGEN->RAMS->ALIGN->resultbus in 2 cycles.
MIPS had to use direct mapped caches to meet this timing, and had to
sample SRAM chips on its own test head to measure if the SRAMs had
pin timings appropriate to R2000 timings.
Once you have set-associativity or allow for 3 cycles {note current
Intel cores are using 5 cycles.} your argument fails.
While your argument might succeed in 2µ-through-90nm, wires have become
so slow that in many cases adding a gat of pure delay does not slow
anything because the cache pipeline has been engineered O F F the/any
critical path. So, while RISC-V persists with the 2 cycle cache pipe-
line, the big boys have migrated to longer pipelines and build execution
windows to absorb the added latencies.

* extra complexity and debugging time in hardware and in system software
  (think about writing and *debugging* and *verifying*
microcode/millicode
  trap handlers for all those messy write-crossing-cache/page-boundary
  cases, especially their interactions with multiprocessor cache
coherency)
There is N O   M I L L I C O D E. There is a sequencer that can take 1
of
5 paths over the AGEN-CACHE-ALIGN stages of the pipeline. SW has to do
nothing to enable this, or overcome poor/bad use of ISA.

* this extra effort means a longer design time and/or greater design
cost,
  and hence (so long as the state-of-the-art of competing systems is
still
  steadily improving with time) that means a net lower price/performance
  relative to competing systems
So does IEEE 754 floating point !! It is significantly more logic
intensive
than IBM or CRAY or Univac floating points. Yet, currently, some larger
cores contain 4-8 of these floating point units (8-16 is you separate
FMUL/FDIV from FADD/FSUB/FCMP).

And, because of the traps
There are N O   T R A P S, no exceptions (other than expected), no
interrupt dependencies, no mispredict repair dependencies, no
coherence dependencies, ...

                          and their overheads (which will likely differ
significantly across different implementations of the same architecture,
e.g., different multiprocessor cache-coherency protocols), any code that
actually *uses* unaligned accesses -- especially unaligned writes --
isn't
performance-portable unless the actual dynamic frequency of unaligned
operations is very low.
UnTrue.

So yes, allowing unaligned access does help "dusty deck" Fortran code...
but it comes at a significant cost.
Less than 0.1% is a significant cost ?!?!?

Date Sujet#  Auteur
28 Nov 24 * What is an N-bit machine?89John Dallman
28 Nov 24 +* Re: What is an N-bit machine?66Michael S
30 Nov 24 i`* Re: What is an N-bit machine?65John Levine
30 Nov 24 i +* Re: What is an N-bit machine?11Stephen Fuld
30 Nov 24 i i`* Re: What is an N-bit machine?10John Levine
30 Nov 24 i i +* Re: What is an N-bit machine?2Michael S
1 Dec 24 i i i`- Re: What is an N-bit machine?1Paul A. Clayton
1 Dec 24 i i `* Re: What is an N-bit machine?7MitchAlsup1
1 Dec 24 i i  +* Re: What is an N-bit machine?2Thomas Koenig
1 Dec 24 i i  i`- Re: What is an N-bit machine?1Anton Ertl
2 Dec 24 i i  `* Re: What is an N-bit machine?4Terje Mathisen
3 Dec 24 i i   +* Re: What is an N-bit machine?2Brian G. Lucas
3 Dec 24 i i   i`- Re: What is an N-bit machine?1Stephen Fuld
15 Dec 24 i i   `- Re: What is an N-bit machine?1Waldek Hebisch
30 Nov 24 i `* Keeping other stuff with addresses (was: What is an N-bit machine?)53Anton Ertl
30 Nov 24 i  +* Re: Keeping other stuff with addresses (was: What is an N-bit machine?)22Anton Ertl
30 Nov 24 i  i`* Re: Keeping other stuff with addresses (was: What is an N-bit machine?)21Thomas Koenig
30 Nov 24 i  i `* Re: Keeping other stuff with addresses (was: What is an N-bit machine?)20Anton Ertl
30 Nov 24 i  i  `* Re: Keeping other stuff with addresses (was: What is an N-bit machine?)19Michael S
30 Nov 24 i  i   `* Re: Keeping other stuff with addresses (was: What is an N-bit machine?)18Anton Ertl
30 Nov 24 i  i    +* Re: Keeping other stuff with addresses (was: What is an N-bit machine?)12Michael S
1 Dec 24 i  i    i`* Re: Keeping other stuff with addresses (was: What is an N-bit machine?)11Anton Ertl
1 Dec 24 i  i    i `* Re: Keeping other stuff with addresses (was: What is an N-bit machine?)10Thomas Koenig
1 Dec 24 i  i    i  +* Re: Keeping other stuff with addresses4David Schultz
1 Dec 24 i  i    i  i`* Re: Keeping other stuff with addresses3Thomas Koenig
4 Dec 24 i  i    i  i `* Re: Keeping other stuff with addresses2MitchAlsup1
4 Dec 24 i  i    i  i  `- Re: Keeping other stuff with addresses1John Levine
1 Dec 24 i  i    i  `* Re: Keeping other stuff with addresses (was: What is an N-bit machine?)5Tim Rentsch
1 Dec 24 i  i    i   +* Re: Keeping other stuff with addresses (was: What is an N-bit machine?)3Thomas Koenig
1 Dec 24 i  i    i   i+- Re: Keeping other stuff with addresses (was: What is an N-bit machine?)1Michael S
1 Dec 24 i  i    i   i`- Re: Keeping other stuff with addresses (was: What is an N-bit machine?)1Brett
2 Dec 24 i  i    i   `- Re: Keeping other stuff with addresses1Terje Mathisen
30 Nov 24 i  i    +* Re: Keeping other stuff with addresses (was: What is an N-bit machine?)3John Levine
1 Dec 24 i  i    i`* What is an N-bit machine?2Anton Ertl
16 Dec23:39 i  i    i `- Re: What is an N-bit machine?1Waldek Hebisch
1 Dec 24 i  i    `* Re: Keeping other stuff with addresses (was: What is an N-bit machine?)2Thomas Koenig
1 Dec 24 i  i     `- Re: Keeping other stuff with addresses (was: What is an N-bit machine?)1Anton Ertl
1 Dec 24 i  +- Re: Keeping other stuff with addresses (was: What is an N-bit machine?)1Anton Ertl
2 Dec 24 i  +* Re: Keeping other stuff with addresses27Chris M. Thomasson
2 Dec 24 i  i`* Re: Keeping other stuff with addresses26MitchAlsup1
2 Dec 24 i  i `* Re: Keeping other stuff with addresses25Chris M. Thomasson
3 Dec 24 i  i  `* Re: Keeping other stuff with addresses24Chris M. Thomasson
3 Dec 24 i  i   `* Re: Keeping other stuff with addresses23Stefan Monnier
3 Dec 24 i  i    +* Re: Keeping other stuff with addresses20John Levine
3 Dec 24 i  i    i+* Re: Keeping other stuff with addresses17Stefan Monnier
4 Dec 24 i  i    ii`* Re: Keeping other stuff with addresses16John Levine
4 Dec 24 i  i    ii `* Re: Keeping other stuff with addresses15Stefan Monnier
4 Dec 24 i  i    ii  +* Re: Keeping other stuff with addresses12MitchAlsup1
4 Dec 24 i  i    ii  i+- Re: Keeping other stuff with addresses1Thomas Koenig
4 Dec 24 i  i    ii  i+* Re: Keeping other stuff with addresses3Stefan Monnier
4 Dec 24 i  i    ii  ii+- Re: Keeping other stuff with addresses1MitchAlsup1
5 Dec 24 i  i    ii  ii`- Re: Keeping other stuff with addresses1Keith Thompson
5 Dec 24 i  i    ii  i+- Re: bytes, Keeping other stuff with addresses1John Levine
22 Dec00:22 i  i    ii  i`* unaligned load/store (was: Re: Keeping other stuff with addresses)6Jonathan Thornburg
22 Dec02:27 i  i    ii  i +* Re: unaligned load/store4MitchAlsup1
22 Dec11:01 i  i    ii  i i`* Re: unaligned load/store3Thomas Koenig
22 Dec12:06 i  i    ii  i i +- Re: unaligned load/store1Anton Ertl
22 Dec12:42 i  i    ii  i i `- Re: unaligned load/store1John Dallman
22 Dec11:33 i  i    ii  i `- Re: unaligned load/store (was: Re: Keeping other stuff with addresses)1Anton Ertl
4 Dec 24 i  i    ii  `* Re: bits and bytes, Keeping other stuff with addresses2John Levine
4 Dec 24 i  i    ii   `- Re: bits and bytes, Keeping other stuff with addresses1Stefan Monnier
3 Dec 24 i  i    i`* Re: Keeping other stuff with addresses2MitchAlsup1
4 Dec 24 i  i    i `- Re: Keeping other stuff with addresses1Chris M. Thomasson
4 Dec 24 i  i    `* Re: Keeping other stuff with addresses2Chris M. Thomasson
4 Dec 24 i  i     `- Re: Keeping other stuff with addresses1Stefan Monnier
4 Dec 24 i  `* Re: Keeping other stuff with addresses2Keith Thompson
4 Dec 24 i   `- Re: Keeping other stuff with addresses1MitchAlsup1
28 Nov 24 +* Re: What is an N-bit machine?18Thomas Koenig
28 Nov 24 i+* Re: What is an N-bit machine?2MitchAlsup1
28 Nov 24 ii`- Re: What is an N-bit machine?1Brett
28 Nov 24 i`* Re: What is an N-bit machine?15Lawrence D'Oliveiro
29 Nov 24 i `* Re: What is an N-bit machine?14John Dallman
29 Nov 24 i  +* Re: What is an N-bit machine?9Lynn Wheeler
29 Nov 24 i  i+- Re: What is an N-bit machine?1John Dallman
29 Nov 24 i  i`* IBM and Amdahl history (Re: What is an N-bit machine?)7Anton Ertl
29 Nov 24 i  i +- Re: IBM and Amdahl history (Re: What is an N-bit machine?)1Lynn Wheeler
29 Nov 24 i  i +- Re: IBM and Amdahl history (Re: What is an N-bit machine?)1Lynn Wheeler
29 Nov 24 i  i `* Re: IBM and Amdahl history (Re: What is an N-bit machine?)4Lawrence D'Oliveiro
30 Nov 24 i  i  `* Re: IBM and Amdahl history (Re: What is an N-bit machine?)3Anton Ertl
1 Dec 24 i  i   +- Re: IBM and Amdahl history (Re: What is an N-bit machine?)1John Dallman
1 Dec 24 i  i   `- Re: IBM and Amdahl history (Re: What is an N-bit machine?)1Lawrence D'Oliveiro
29 Nov 24 i  `* Re: What is an N-bit machine?4Lawrence D'Oliveiro
30 Nov 24 i   +- Re: What is an N-bit machine?1Brett
30 Nov 24 i   +- Re: market power, What is an N-bit machine?1John Levine
30 Nov 24 i   `- Re: What is an N-bit machine?1Lynn Wheeler
28 Nov 24 +- Re: What is an N-bit machine?1MitchAlsup1
28 Nov 24 +- Re: What is an N-bit machine?1Lynn Wheeler
29 Nov 24 `* Re: What is an N-bit machine?2Anton Ertl
29 Nov 24  `- Re: What is an N-bit machine?1Thomas Koenig

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal