Newsportal USENET - Re: Memory ordering

Re: Memory ordering

Sujet : Re: Memory ordering
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 15. Nov 2024, 21:53:00

Autres entêtes

Organisation : A noiseless patient Spider
Message-ID : <vh8cbo$3j8c5$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11
User-Agent : Mozilla Thunderbird

On 11/15/2024 11:27 AM, Anton Ertl wrote:

jseigh <jseigh_es00@xemaps.com> writes:
Anybody doing that sort of programming, i.e. lock-free or distributed
algorithms, who can't handle weakly consistent memory models, shouldn't
be doing that sort of programming in the first place.
Do you have any argument that supports this claim.

Strongly consistent memory won't help incompetence.
Strong words to hide lack of arguments?

In my case, as I see it:
   The tradeoff is more about implementation cost, performance, etc.
Weak model:
   Cheaper (and simpler) to implement;
   Performs better when there is no need to synchronize memory;
   Performs worse when there is need to synchronize memory;
   ...
However, local to the CPU core:
   Not respecting things like RAW hazards does not seem well advised.
Like, if we store to a location, and then immediately read back from it, one can expect to see the most recently written value, not the previous value. Or, if one stores to two adjacent memory locations, one expects that both stores write the data correctly.
Granted, it is a tradeoff:
   Not bothering: Fast, Cheap, but may break expected behavior;
   Could naively use NOPs if aliasing is possible, but this is bad.
   Add an interlock check, stall the pipeline if it happens:
   Works, but can add a noticeable performance penalty;
   My attempts at 75 and 100 MHz cores had often done this;
   Sadly, memory RAW and WAW hazards are not exactly rare.
   Use internal forwarding, so written data is used directly next cycle.
   Better performance;
   But, has a fairly high cost for the FPGA (*1).
*1: This factor (along with L1 cache sizes) weighs in heavily to why I continue to use 50MHz. Otherwise, I could use 75 MHz, but this internal forwarding logic, and L1 caches with 32K of BRAM (excluding metadata) and 1-cycle access, are not really viable at 75 MHz.
For the L2 cache, which is much bigger, one can use a few extra pad-cycles to access the Block-RAM array. Though, 5 cycle latency for Load/Store operations would be, not good.
Can note that with Block-RAM, usual behavior seems to be that if one tries to read from one port while writing to another port on the same clock edge, if both are at the same location, the prior contents will be returned. This may be a general behavior in Verilog though, rather than a Block-RAM thing (also seems to apply to LUTRAM if accessed in the same pattern; though LUTRAM allows also reading the value via combinatorial logic rather than a clock-edge, which seems to always return the value from the most recent clock-edge).
As I can note, a 4K or 8K L1 cache with stall on RAW or WAW, at 75 MHz, tends to perform worse IME, than a 32K cache running at 50 MHz with no RAW/WAW stall.
Also, trying to increase MHz by increasing instruction latency in many cases was also not ideal for performance.
Granted, if I were to do things the "DEC Alpha" way, I probably could run stuff at 75MHz, but then would likely need the compiler to insert a bunch of strategic NOPs so that the program doesn't break.
For memory ordering, possibly, in my case a case could be made for an "order respecting DRAM cache" via the MMIO interface, say:
   F000_01000000..F000_3FFFFFFF
Could be defined to alias with the main RAM map, but with strictly sequential ordering for every memory access across all cores (at the expense of performance).
Where:
   0000_00000000..7FFF_FFFFFFFF: Virtual Address Space
   8000_00000000..BFFF_FFFFFFFF: Supervisor-Only Virtual Address Space
   C000_00000000..CFFF_FFFFFFFF: Physical Address Space, Default Caching
   D000_00000000..DFFF_FFFFFFFF: Physical Address Space, Volatile/NoCache
   E000_00000000..EFFF_FFFFFFFF: Reserved
   F000_00000000..FFFF_FFFFFFFF: MMIO Space
MMIO space is currently fully independent of RAM space.
However, at present:
   FFFF_F0000000..FFFF_FFFFFFFF: MMIO Space, as Used for MMIO devices.
So, in theory, remerging RAM IO space into MMIO Space would be possible (well, except that trying to access HW MMIO address ranges via RAM-space access would likely be disallowed).
Can note, MMU disabled:
   0000_00000000..0FFF_FFFFFFFF: Same as C000..CFFF space.
   1000_00000000..7FFF_FFFFFFFF: Invalid
...
Granted, current scheme does set a limit of 16TB of RAM.
   But, biggest FPGA boards I have only have 256MB, so, ...
And, current VA map within TestKern (from memory):
   0000_00000000..0000_00FFFFFF: NULL Space
   0000_01000000..0000_3FFFFFFF: RAM Range (Identity Mapped)
   0000_40000000..0000_BFFFFFFF: Direct Page Mapping (no swap)
   0001_00000000..3FFF_FFFFFFFF: Mapped to swapfile, Global
   4000_00000000..7FFF_FFFFFFFF: Process Local
Note that, within the RAM-range, the RAM will wrap around. The specifics of the wraparound are used to detect RAM size (this would set an effective limit at 512MB, after which no wraparound would be detected).
Specifics here would need to change if larger RAM sizes were supported.
Not sure how RAM size is detected with DIMM modules. IIRC, with PCs, it was more probe along linearly until one finds an address that no longer returns valid data (say, if one hits the 1GB mark, and gets back 000000 or FFFFFFF or similar, assume end of RAM at 1GB).
One does need to make sure caches (including L2 cache) are flushed during all this, as the caches doing their usual cache thing, may incorrectly detect larger RAM than actually exists.
...

- anton

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
28 Oct 24	Arm ldaxr / stxr loop question	136	jseigh
31 Oct 24	Re: Arm ldaxr / stxr loop question	1	MitchAlsup1
31 Oct 24	Re: Arm ldaxr / stxr loop question	1	Chris M. Thomasson
1 Nov 24	Re: Arm ldaxr / stxr loop question	124	aph
2 Nov 24	Re: Arm ldaxr / stxr loop question	123	Chris M. Thomasson
8 Nov 24	Re: Arm ldaxr / stxr loop question	122	Chris M. Thomasson
8 Nov 24	Re: Arm ldaxr / stxr loop question	121	Chris M. Thomasson
9 Nov 24	Re: Arm ldaxr / stxr loop question	118	Chris M. Thomasson
9 Nov 24	Re: Arm ldaxr / stxr loop question	1	Chris M. Thomasson
11 Nov 24	Re: Arm ldaxr / stxr loop question	5	MitchAlsup1
11 Nov 24	Re: Arm ldaxr / stxr loop question	1	Michael S
11 Nov 24	Re: Arm ldaxr / stxr loop question	3	jseigh
11 Nov 24	Re: Arm ldaxr / stxr loop question	2	Chris M. Thomasson
12 Nov 24	Re: Arm ldaxr / stxr loop question	1	Chris M. Thomasson
11 Nov 24	Re: Arm ldaxr / stxr loop question	1	Michael S
11 Nov 24	Re: Arm ldaxr / stxr loop question	1	Chris M. Thomasson
12 Nov 24	Re: Arm ldaxr / stxr loop question	23	aph
12 Nov 24	Re: Arm ldaxr / stxr loop question	18	Chris M. Thomasson
13 Nov 24	Re: Arm ldaxr / stxr loop question	17	aph
13 Nov 24	Re: Arm ldaxr / stxr loop question	3	jseigh
13 Nov 24	Re: Arm ldaxr / stxr loop question	2	aph
13 Nov 24	Re: Arm ldaxr / stxr loop question	1	Chris M. Thomasson
13 Nov 24	Re: Arm ldaxr / stxr loop question	1	MitchAlsup1
13 Nov 24	Re: Arm ldaxr / stxr loop question	2	Chris M. Thomasson
13 Nov 24	Re: Arm ldaxr / stxr loop question	1	Chris M. Thomasson
13 Nov 24	Re: Arm ldaxr / stxr loop question	2	Chris M. Thomasson
13 Nov 24	Re: Arm ldaxr / stxr loop question	1	Chris M. Thomasson
13 Nov 24	Re: Arm ldaxr / stxr loop question	8	Terje Mathisen
13 Nov 24	Brilliance (was: Arm ldaxr / stxr loop question)	4	Anton Ertl
13 Nov 24	Re: Brilliance	1	BGB
14 Nov 24	Re: Brilliance	2	Terje Mathisen
16 Nov 24	Re: Brilliance	1	Thomas Koenig
13 Nov 24	Re: Arm ldaxr / stxr loop question	3	aph
14 Nov 24	Re: Arm ldaxr / stxr loop question	2	Terje Mathisen
14 Nov 24	Re: Arm ldaxr / stxr loop question	1	Chris M. Thomasson
12 Nov 24	Re: Arm ldaxr / stxr loop question	4	BGB
13 Nov 24	Re: Arm ldaxr / stxr loop question	3	Chris M. Thomasson
13 Nov 24	Re: Arm ldaxr / stxr loop question	2	Robert Finch
26 Dec 24	Re: Arm ldaxr / stxr loop question	1	Chris M. Thomasson
14 Nov 24	Re: Arm ldaxr / stxr loop question	86	Kent Dickey
14 Nov 24	Re: Arm ldaxr / stxr loop question	85	aph
14 Nov 24	Re: Arm ldaxr / stxr loop question	81	Chris M. Thomasson
15 Nov 24	Re: Arm ldaxr / stxr loop question	80	aph
15 Nov 24	Re: Arm ldaxr / stxr loop question	1	Chris M. Thomasson
15 Nov 24	Memory ordering (was: Arm ldaxr / stxr loop question)	78	Anton Ertl
15 Nov 24	Re: Memory ordering	44	Chris M. Thomasson
15 Nov 24	Re: Memory ordering	43	Michael S
15 Nov 24	Re: Memory ordering	42	Chris M. Thomasson
16 Nov 24	Re: Memory ordering	41	Chris M. Thomasson
16 Nov 24	Re: Memory ordering	1	Chris M. Thomasson
17 Nov 24	Re: Memory ordering	39	jseigh
17 Nov 24	Re: Memory ordering	33	Anton Ertl
19 Nov 24	Re: Memory ordering	32	Chris M. Thomasson
3 Dec 24	Re: Memory ordering	31	Anton Ertl
3 Dec 24	Re: Memory ordering	30	jseigh
3 Dec 24	Re: Memory ordering	29	MitchAlsup1
4 Dec 24	Re: Memory ordering	22	Stefan Monnier
4 Dec 24	Re: Memory ordering	3	MitchAlsup1
4 Dec 24	Re: Memory ordering	2	Stefan Monnier
4 Dec 24	Re: Memory ordering	1	MitchAlsup1
4 Dec 24	Re: Memory ordering	18	jseigh
5 Dec 24	Re: Memory ordering	17	Chris M. Thomasson
5 Dec 24	Re: Memory ordering	8	jseigh
16 Dec 24	Re: Memory ordering	7	Chris M. Thomasson
17 Dec 24	Re: Memory ordering	6	jseigh
17 Dec 24	Re: Memory ordering	1	aph
17 Dec 24	Re: Memory ordering	4	Chris M. Thomasson
17 Dec 24	Re: Memory ordering	1	MitchAlsup1
18 Dec 24	Re: Memory ordering	2	jseigh
19 Dec 24	Re: Memory ordering	1	Chris M. Thomasson
19 Dec 24	Re: Memory ordering	8	MitchAlsup1
19 Dec 24	Re: Memory ordering	7	Chris M. Thomasson
20 Dec 24	Re: Memory ordering	5	MitchAlsup1
20 Dec 24	Re: Memory ordering	2	Chris M. Thomasson
20 Dec 24	Re: Memory ordering	1	Chris M. Thomasson
20 Dec 24	Re: Memory ordering	2	Chris M. Thomasson
20 Dec 24	Re: Memory ordering	1	Chris M. Thomasson
20 Dec 24	Re: Memory ordering	1	Chris M. Thomasson
4 Dec 24	Re: Memory ordering	1	Chris M. Thomasson
4 Dec 24	Re: Memory ordering	1	MitchAlsup1
5 Dec 24	Re: Memory ordering	4	Tim Rentsch
6 Dec 24	Re: Memory ordering	2	Terje Mathisen
6 Dec 24	Re: Memory ordering	1	Tim Rentsch
20 Dec 24	Re: Memory ordering	1	Chris M. Thomasson
17 Nov 24	Re: Memory ordering	2	Chris M. Thomasson
19 Nov 24	Re: Memory ordering	1	Chris M. Thomasson
18 Nov 24	Re: Memory ordering	1	aph
20 Nov 24	Re: Memory ordering	1	Chris M. Thomasson
20 Nov 24	Re: Memory ordering	1	Chris M. Thomasson
15 Nov 24	Re: Memory ordering (was: Arm ldaxr / stxr loop question)	2	Michael S
15 Nov 24	Re: Memory ordering (was: Arm ldaxr / stxr loop question)	1	Anton Ertl
15 Nov 24	Re: Memory ordering	28	jseigh
15 Nov 24	Re: Memory ordering	27	Anton Ertl
15 Nov 24	Re: Memory ordering	18	Chris M. Thomasson
16 Nov 24	Re: Memory ordering	17	Anton Ertl
16 Nov 24	Re: Memory ordering	16	Chris M. Thomasson
17 Nov 24	Re: Memory ordering	15	Anton Ertl
18 Nov 24	Re: Memory ordering	14	Chris M. Thomasson
18 Nov 24	Re: Memory ordering	13	Anton Ertl
19 Nov 24	Re: Memory ordering	12	Chris M. Thomasson
19 Nov 24	Re: Memory ordering	11	Chris M. Thomasson
15 Nov 24	Re: Memory ordering	7	BGB
17 Nov 24	Re: Memory ordering	1	Tim Rentsch
16 Nov 24	Re: Memory ordering (was: Arm ldaxr / stxr loop question)	1	Anton Ertl
16 Nov 24	Re: Memory ordering (was: Arm ldaxr / stxr loop question)	1	Lawrence D'Oliveiro
18 Nov 24	Re: Memory ordering	1	aph
21 Nov 24	Re: Arm ldaxr / stxr loop question	3	Kent Dickey
9 Nov 24	Re: Arm ldaxr / stxr loop question	2	jseigh
8 Nov 24	Re: Arm ldaxr / stxr loop question	8	Lawrence D'Oliveiro
20 Dec 24	Re: Arm ldaxr / stxr loop question	1	Chris M. Thomasson