Newsportal USENET - Re: Arguments for a sane ISA 6-years later

On 7/29/2024 7:59 AM, Anton Ertl wrote:

BGB <cr88192@gmail.com> writes:
On 7/26/2024 12:00 PM, Anton Ertl wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
and it's more efficient
>
That depends on the hardware.
>
Yes, the Alpha 21164 with its imprecise exceptions was "more
efficient" than other hardware for a while, then the Pentium Pro came
along and gave us precise exceptions and more efficiency. And
eventually the Alpha people learned the trick, too, and 21264 provided
precise exceptions (although they did not admit this) and more
efficieny.
>
Similarly, I expect that hardware that is designed for good TSO or
sequential consistency performance will run faster on code written for
this model than code written for weakly consistent hardware will run
on that hardware. That's because software written for weakly
consistent hardware often has to insert barriers or atomic operations
just in case, and these operations are slow on hardware optimized for
weak consistency.
>
>
TSO requires more significant hardware complexity though.
An efficient implementation of TSO or sequential consistency requires
more hardware, yes.
Floating point requires more hardware than fixed point. Precise
exceptions require more hardware than imprecise exceptions. Caches
require more hardware than the local memory of Cells SPEs. OoO
requires more hardware than in-order; in this case the IA-64
implementations demonstrated that you could then spend the area budget
on more in-order resources (and big caches) and still fail to keep up
on SPECint with the smaller OoO competition. In all these cases we
decided that the benefit is worth the additional hardware. I think
that's the case for strong memory ordering, too.

As noted, I had needed to cut corners in a lot of areas:
   Caches are direct-mapped;
   In-order;
   Floating point is not exact;
   ...
Otherwise, stuff isn't going to fit into the FPGAs.
Something like TSO is a lot of complexity for not much gain.
Contrast, floating point and precise exceptions are a lot more relevant to software. Floating point: "float" and "double" are not exactly rare, and performing like crap isn't ideal.
Precise exceptions: Otherwise one can't do instruction emulation traps, or software managed TLB (granted, software-managed TLB is itself a form of corner cutting).
As noted, I had found associative caches mostly not worthwhile.
   So, L1 caches ended up as direct mapped;
   L2 is also direct-mapped.
Did end up adding a smaller 4-way cache (VCA Cache) between the L1 and L2 caches, which mostly keeps track of stored lines from the L1 and fetched lines from the L2 and absorbing a lot of the L1 conflict misses. The main effect it has is (seemingly) causing a notable reduction in the number of L2 misses. Had experimented with 8-way, but 8-way was too expensive. Also it is Write-Through rather than Write-Back.
This cache is 64x 4-way, or ~ 4K. So, say:
   L1 D$, 32K DM WB VIVT
   L1 I$, 16K DM WB VIVT
   VCA ,   4K 4W WT PIPT
   L2   , 256K DM WB PIPT
With VCA, associativity mattered more than total size, but 32 or 64 rows did notably better than merely having 4 or 8 cache lines (fully associative), without that much difference in cost (the associativity costs a lot more than the LUTRAM in this case). But, unlike with purely DM caches, "cache knocking" is no longer particularly effective (doing trickery with addresses to knock things out of cache, only working effectively with direct-mapped caches; so it is pros/cons here).
But, can note that FPGAs have relatively expensive logic and relatively cheap SRAM (or, apparently, the inverse of ASICs).
Though, in contrast to my initial estimates, I did manage to figure out a way to add bank-switched the GPRs without blowing out the resource budget or timing.
But, as for whether it would also be viable for an ASIC core, dunno.
   Would require around 2kB of SRAM for the mechanism as it exists.
   Granted, this is smaller than the typical L1 caches.
Still TBD whether it "actually makes sense"...

Seems like it would be harder to debug the hardware since:
There is more that has to go on in the hardware for TSO to work;
Software will have higher expectations that it actually work.
Possible. Delivering working hardware is the job of hardware
engineers. Intel and AMD apparently have no problems getting the TSO
parts of their architectures right. However, it seems that they don't
go for "really efficient" TSO, or they would just upgrade the parts of
their architecture with weaker consistency to have TSO.

Yeah, but for a hobbyist this will be more of an issue...
Similar likely for microcontrollers (if relevant), embedded CPUs, manycore systems, or systems with high-latency links (such as over Ethernet and TCP/IP). The cost of TSO likely isn't worth it.
Seemingly (looking at charts), ARM and POWER didn't find it worthwhile.
For RISC-V, it is an optional extension (weak is the assumed default).
To some extent, it is mostly an x86 and x86-64 thing...

- anton

Date	Sujet	#	Auteur
24 Jul 24	Arguments for a sane ISA 6-years later	63	MitchAlsup1
25 Jul 24	Re: Arguments for a sane ISA 6-years later	62	BGB
25 Jul 24	Re: Arguments for a sane ISA 6-years later	57	Chris M. Thomasson
26 Jul 24	Re: Arguments for a sane ISA 6-years later	56	Anton Ertl
26 Jul 24	Re: Arguments for a sane ISA 6-years later	20	BGB
29 Jul 24	Re: Arguments for a sane ISA 6-years later	19	Anton Ertl
29 Jul 24	Intel overvoltage (was: Arguments for a sane ISA 6-years later)	2	Thomas Koenig
29 Jul 24	Re: Intel overvoltage	1	BGB
29 Jul 24	Re: Arguments for a sane ISA 6-years later	16	BGB
30 Jul 24	Re: Arguments for a sane ISA 6-years later	15	Anton Ertl
30 Jul 24	Re: Arguments for a sane ISA 6-years later	14	BGB
30 Jul 24	Re: Arguments for a sane ISA 6-years later	2	Chris M. Thomasson
30 Jul 24	Re: Arguments for a sane ISA 6-years later	1	BGB
1 Aug 24	Re: Arguments for a sane ISA 6-years later	11	Anton Ertl
1 Aug 24	Re: Arguments for a sane ISA 6-years later	1	Michael S
1 Aug 24	Re: Arguments for a sane ISA 6-years later	8	MitchAlsup1
1 Aug 24	Re: Arguments for a sane ISA 6-years later	1	Michael S
2 Aug 24	Re: Arguments for a sane ISA 6-years later	6	MitchAlsup1
2 Aug 24	Re: Arguments for a sane ISA 6-years later	1	Michael S
4 Aug 24	Re: Arguments for a sane ISA 6-years later	4	MitchAlsup1
5 Aug 24	Re: Arguments for a sane ISA 6-years later	3	Stephen Fuld
5 Aug 24	Re: Arguments for a sane ISA 6-years later	2	Stephen Fuld
5 Aug 24	Re: Arguments for a sane ISA 6-years later	1	MitchAlsup1
1 Aug 24	Re: Arguments for a sane ISA 6-years later	1	BGB
26 Jul 24	Re: Arguments for a sane ISA 6-years later	20	MitchAlsup1
27 Jul 24	Re: Arguments for a sane ISA 6-years later	1	BGB
29 Jul 24	Memory ordering (was: Arguments for a sane ISA 6-years later)	18	Anton Ertl
29 Jul 24	Re: Memory ordering	15	MitchAlsup1
29 Jul 24	Re: Memory ordering	6	Chris M. Thomasson
29 Jul 24	Re: Memory ordering	5	MitchAlsup1
30 Jul 24	Re: Memory ordering	4	Michael S
31 Jul 24	Re: Memory ordering	3	Chris M. Thomasson
31 Jul 24	Re: Memory ordering	2	Chris M. Thomasson
31 Jul 24	Re: Memory ordering	1	Chris M. Thomasson
30 Jul 24	Re: Memory ordering	8	Anton Ertl
30 Jul 24	Re: Memory ordering	2	Chris M. Thomasson
30 Jul 24	Re: Memory ordering	1	Chris M. Thomasson
31 Jul 24	Re: Memory ordering	5	MitchAlsup1
31 Jul 24	Re: Memory ordering	1	Chris M. Thomasson
1 Aug 24	Re: Memory ordering	3	Anton Ertl
1 Aug 24	Re: Memory ordering	2	MitchAlsup1
2 Aug 24	Re: Memory ordering	1	Anton Ertl
29 Jul 24	Re: Memory ordering	2	Chris M. Thomasson
30 Jul 24	Re: Memory ordering	1	Chris M. Thomasson
29 Jul 24	Re: Arguments for a sane ISA 6-years later	13	Chris M. Thomasson
29 Jul 24	Re: Arguments for a sane ISA 6-years later	9	BGB
29 Jul 24	Re: Arguments for a sane ISA 6-years later	8	Chris M. Thomasson
29 Jul 24	Re: Arguments for a sane ISA 6-years later	1	Chris M. Thomasson
29 Jul 24	Re: Arguments for a sane ISA 6-years later	2	BGB
29 Jul 24	Re: Arguments for a sane ISA 6-years later	1	Chris M. Thomasson
30 Jul 24	Re: Arguments for a sane ISA 6-years later	4	jseigh
30 Jul 24	Re: Arguments for a sane ISA 6-years later	3	Chris M. Thomasson
31 Jul 24	Re: Arguments for a sane ISA 6-years later	2	jseigh
31 Jul 24	Re: Arguments for a sane ISA 6-years later	1	Chris M. Thomasson
29 Jul 24	Memory ordering (was: Arguments for a sane ISA 6-years later)	1	Anton Ertl
29 Jul 24	Re: Arguments for a sane ISA 6-years later	2	MitchAlsup1
29 Jul 24	Re: Arguments for a sane ISA 6-years later	1	BGB
6 Aug 24	Re: Arguments for a sane ISA 6-years later	2	Chris M. Thomasson
6 Aug 24	Re: Arguments for a sane ISA 6-years later	1	Chris M. Thomasson
25 Jul 24	Re: Arguments for a sane ISA 6-years later	4	MitchAlsup1
26 Jul 24	Re: Arguments for a sane ISA 6-years later	1	BGB
28 Jul 24	Re: Arguments for a sane ISA 6-years later	2	Paul A. Clayton
28 Jul 24	Re: Arguments for a sane ISA 6-years later	1	MitchAlsup1