Re: Arguments for a sane ISA 6-years later

Liste des GroupesRevenir à c arch 
Sujet : Re: Arguments for a sane ISA 6-years later
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 29. Jul 2024, 17:43:39
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v88gru$ij11$1@dont-email.me>
References : 1 2 3 4 5 6
User-Agent : Mozilla Thunderbird
On 7/29/2024 7:59 AM, Anton Ertl wrote:
BGB <cr88192@gmail.com> writes:
On 7/26/2024 12:00 PM, Anton Ertl wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
and it's more efficient
>
That depends on the hardware.
>
Yes, the Alpha 21164 with its imprecise exceptions was "more
efficient" than other hardware for a while, then the Pentium Pro came
along and gave us precise exceptions and more efficiency.  And
eventually the Alpha people learned the trick, too, and 21264 provided
precise exceptions (although they did not admit this) and more
efficieny.
>
Similarly, I expect that hardware that is designed for good TSO or
sequential consistency performance will run faster on code written for
this model than code written for weakly consistent hardware will run
on that hardware.  That's because software written for weakly
consistent hardware often has to insert barriers or atomic operations
just in case, and these operations are slow on hardware optimized for
weak consistency.
>
>
TSO requires more significant hardware complexity though.
 An efficient implementation of TSO or sequential consistency requires
more hardware, yes.
 Floating point requires more hardware than fixed point.  Precise
exceptions require more hardware than imprecise exceptions.  Caches
require more hardware than the local memory of Cells SPEs.  OoO
requires more hardware than in-order; in this case the IA-64
implementations demonstrated that you could then spend the area budget
on more in-order resources (and big caches) and still fail to keep up
on SPECint with the smaller OoO competition.  In all these cases we
decided that the benefit is worth the additional hardware.  I think
that's the case for strong memory ordering, too.
 
As noted, I had needed to cut corners in a lot of areas:
   Caches are direct-mapped;
   In-order;
   Floating point is not exact;
   ...
Otherwise, stuff isn't going to fit into the FPGAs.
Something like TSO is a lot of complexity for not much gain.
Contrast, floating point and precise exceptions are a lot more relevant to software. Floating point: "float" and "double" are not exactly rare, and performing like crap isn't ideal.
Precise exceptions: Otherwise one can't do instruction emulation traps, or software managed TLB (granted, software-managed TLB is itself a form of corner cutting).
As noted, I had found associative caches mostly not worthwhile.
   So, L1 caches ended up as direct mapped;
   L2 is also direct-mapped.
Did end up adding a smaller 4-way cache (VCA Cache) between the L1 and L2 caches, which mostly keeps track of stored lines from the L1 and fetched lines from the L2 and absorbing a lot of the L1 conflict misses. The main effect it has is (seemingly) causing a notable reduction in the number of L2 misses. Had experimented with 8-way, but 8-way was too expensive. Also it is Write-Through rather than Write-Back.
This cache is 64x 4-way, or ~ 4K. So, say:
   L1 D$,  32K DM WB VIVT
   L1 I$,  16K DM WB VIVT
   VCA  ,   4K 4W WT PIPT
   L2   , 256K DM WB PIPT
With VCA, associativity mattered more than total size, but 32 or 64 rows did notably better than merely having 4 or 8 cache lines (fully associative), without that much difference in cost (the associativity costs a lot more than the LUTRAM in this case). But, unlike with purely DM caches, "cache knocking" is no longer particularly effective (doing trickery with addresses to knock things out of cache, only working effectively with direct-mapped caches; so it is pros/cons here).
But, can note that FPGAs have relatively expensive logic and relatively cheap SRAM (or, apparently, the inverse of ASICs).
Though, in contrast to my initial estimates, I did manage to figure out a way to add bank-switched the GPRs without blowing out the resource budget or timing.
But, as for whether it would also be viable for an ASIC core, dunno.
   Would require around 2kB of SRAM for the mechanism as it exists.
   Granted, this is smaller than the typical L1 caches.
Still TBD whether it "actually makes sense"...

Seems like it would be harder to debug the hardware since:
   There is more that has to go on in the hardware for TSO to work;
   Software will have higher expectations that it actually work.
 Possible.  Delivering working hardware is the job of hardware
engineers.  Intel and AMD apparently have no problems getting the TSO
parts of their architectures right.  However, it seems that they don't
go for "really efficient" TSO, or they would just upgrade the parts of
their architecture with weaker consistency to have TSO.
 
Yeah, but for a hobbyist this will be more of an issue...
Similar likely for microcontrollers (if relevant), embedded CPUs, manycore systems, or systems with high-latency links (such as over Ethernet and TCP/IP). The cost of TSO likely isn't worth it.
Seemingly (looking at charts), ARM and POWER didn't find it worthwhile.
For RISC-V, it is an optional extension (weak is the assumed default).
To some extent, it is mostly an x86 and x86-64 thing...

- anton

Date Sujet#  Auteur
24 Jul 24 * Arguments for a sane ISA 6-years later63MitchAlsup1
25 Jul 24 `* Re: Arguments for a sane ISA 6-years later62BGB
25 Jul 24  +* Re: Arguments for a sane ISA 6-years later57Chris M. Thomasson
26 Jul 24  i`* Re: Arguments for a sane ISA 6-years later56Anton Ertl
26 Jul 24  i +* Re: Arguments for a sane ISA 6-years later20BGB
29 Jul 24  i i`* Re: Arguments for a sane ISA 6-years later19Anton Ertl
29 Jul 24  i i +* Intel overvoltage (was: Arguments for a sane ISA 6-years later)2Thomas Koenig
29 Jul 24  i i i`- Re: Intel overvoltage1BGB
29 Jul 24  i i `* Re: Arguments for a sane ISA 6-years later16BGB
30 Jul 24  i i  `* Re: Arguments for a sane ISA 6-years later15Anton Ertl
30 Jul 24  i i   `* Re: Arguments for a sane ISA 6-years later14BGB
30 Jul 24  i i    +* Re: Arguments for a sane ISA 6-years later2Chris M. Thomasson
31 Jul 24  i i    i`- Re: Arguments for a sane ISA 6-years later1BGB
1 Aug 24  i i    `* Re: Arguments for a sane ISA 6-years later11Anton Ertl
1 Aug 24  i i     +- Re: Arguments for a sane ISA 6-years later1Michael S
1 Aug 24  i i     +* Re: Arguments for a sane ISA 6-years later8MitchAlsup1
1 Aug 24  i i     i+- Re: Arguments for a sane ISA 6-years later1Michael S
2 Aug 24  i i     i`* Re: Arguments for a sane ISA 6-years later6MitchAlsup1
2 Aug 24  i i     i +- Re: Arguments for a sane ISA 6-years later1Michael S
4 Aug 24  i i     i `* Re: Arguments for a sane ISA 6-years later4MitchAlsup1
5 Aug 24  i i     i  `* Re: Arguments for a sane ISA 6-years later3Stephen Fuld
5 Aug 24  i i     i   `* Re: Arguments for a sane ISA 6-years later2Stephen Fuld
5 Aug 24  i i     i    `- Re: Arguments for a sane ISA 6-years later1MitchAlsup1
1 Aug 24  i i     `- Re: Arguments for a sane ISA 6-years later1BGB
26 Jul 24  i +* Re: Arguments for a sane ISA 6-years later20MitchAlsup1
27 Jul 24  i i+- Re: Arguments for a sane ISA 6-years later1BGB
29 Jul 24  i i`* Memory ordering (was: Arguments for a sane ISA 6-years later)18Anton Ertl
29 Jul 24  i i +* Re: Memory ordering15MitchAlsup1
29 Jul 24  i i i+* Re: Memory ordering6Chris M. Thomasson
29 Jul 24  i i ii`* Re: Memory ordering5MitchAlsup1
30 Jul 24  i i ii `* Re: Memory ordering4Michael S
31 Jul 24  i i ii  `* Re: Memory ordering3Chris M. Thomasson
31 Jul 24  i i ii   `* Re: Memory ordering2Chris M. Thomasson
31 Jul 24  i i ii    `- Re: Memory ordering1Chris M. Thomasson
30 Jul 24  i i i`* Re: Memory ordering8Anton Ertl
30 Jul 24  i i i +* Re: Memory ordering2Chris M. Thomasson
30 Jul 24  i i i i`- Re: Memory ordering1Chris M. Thomasson
31 Jul 24  i i i `* Re: Memory ordering5MitchAlsup1
31 Jul 24  i i i  +- Re: Memory ordering1Chris M. Thomasson
1 Aug 24  i i i  `* Re: Memory ordering3Anton Ertl
1 Aug 24  i i i   `* Re: Memory ordering2MitchAlsup1
2 Aug 24  i i i    `- Re: Memory ordering1Anton Ertl
29 Jul 24  i i `* Re: Memory ordering2Chris M. Thomasson
30 Jul 24  i i  `- Re: Memory ordering1Chris M. Thomasson
29 Jul 24  i +* Re: Arguments for a sane ISA 6-years later13Chris M. Thomasson
29 Jul 24  i i+* Re: Arguments for a sane ISA 6-years later9BGB
29 Jul 24  i ii`* Re: Arguments for a sane ISA 6-years later8Chris M. Thomasson
29 Jul 24  i ii +- Re: Arguments for a sane ISA 6-years later1Chris M. Thomasson
29 Jul 24  i ii +* Re: Arguments for a sane ISA 6-years later2BGB
29 Jul 24  i ii i`- Re: Arguments for a sane ISA 6-years later1Chris M. Thomasson
30 Jul 24  i ii `* Re: Arguments for a sane ISA 6-years later4jseigh
30 Jul 24  i ii  `* Re: Arguments for a sane ISA 6-years later3Chris M. Thomasson
31 Jul 24  i ii   `* Re: Arguments for a sane ISA 6-years later2jseigh
31 Jul 24  i ii    `- Re: Arguments for a sane ISA 6-years later1Chris M. Thomasson
29 Jul 24  i i+- Memory ordering (was: Arguments for a sane ISA 6-years later)1Anton Ertl
29 Jul 24  i i`* Re: Arguments for a sane ISA 6-years later2MitchAlsup1
29 Jul 24  i i `- Re: Arguments for a sane ISA 6-years later1BGB
6 Aug 24  i `* Re: Arguments for a sane ISA 6-years later2Chris M. Thomasson
6 Aug 24  i  `- Re: Arguments for a sane ISA 6-years later1Chris M. Thomasson
26 Jul 24  `* Re: Arguments for a sane ISA 6-years later4MitchAlsup1
27 Jul 24   +- Re: Arguments for a sane ISA 6-years later1BGB
28 Jul 24   `* Re: Arguments for a sane ISA 6-years later2Paul A. Clayton
28 Jul 24    `- Re: Arguments for a sane ISA 6-years later1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal