Re: Memory ordering

Liste des GroupesRevenir à c arch 
Sujet : Re: Memory ordering
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 15. Nov 2024, 21:53:00
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vh8cbo$3j8c5$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11
User-Agent : Mozilla Thunderbird
On 11/15/2024 11:27 AM, Anton Ertl wrote:
jseigh <jseigh_es00@xemaps.com> writes:
Anybody doing that sort of programming, i.e. lock-free or distributed
algorithms, who can't handle weakly consistent memory models, shouldn't
be doing that sort of programming in the first place.
 Do you have any argument that supports this claim.
 
Strongly consistent memory won't help incompetence.
 Strong words to hide lack of arguments?
 
In my case, as I see it:
   The tradeoff is more about implementation cost, performance, etc.
Weak model:
   Cheaper (and simpler) to implement;
   Performs better when there is no need to synchronize memory;
   Performs worse when there is need to synchronize memory;
   ...
However, local to the CPU core:
   Not respecting things like RAW hazards does not seem well advised.
Like, if we store to a location, and then immediately read back from it, one can expect to see the most recently written value, not the previous value. Or, if one stores to two adjacent memory locations, one expects that both stores write the data correctly.
Granted, it is a tradeoff:
   Not bothering: Fast, Cheap, but may break expected behavior;
     Could naively use NOPs if aliasing is possible, but this is bad.
   Add an interlock check, stall the pipeline if it happens:
     Works, but can add a noticeable performance penalty;
     My attempts at 75 and 100 MHz cores had often done this;
     Sadly, memory RAW and WAW hazards are not exactly rare.
   Use internal forwarding, so written data is used directly next cycle.
     Better performance;
     But, has a fairly high cost for the FPGA (*1).
*1: This factor (along with L1 cache sizes) weighs in heavily to why I continue to use 50MHz. Otherwise, I could use 75 MHz, but this internal forwarding logic, and L1 caches with 32K of BRAM (excluding metadata) and 1-cycle access, are not really viable at 75 MHz.
For the L2 cache, which is much bigger, one can use a few extra pad-cycles to access the Block-RAM array. Though, 5 cycle latency for Load/Store operations would be, not good.
Can note that with Block-RAM, usual behavior seems to be that if one tries to read from one port while writing to another port on the same clock edge, if both are at the same location, the prior contents will be returned. This may be a general behavior in Verilog though, rather than a Block-RAM thing (also seems to apply to LUTRAM if accessed in the same pattern; though LUTRAM allows also reading the value via combinatorial logic rather than a clock-edge, which seems to always return the value from the most recent clock-edge).
As I can note, a 4K or 8K L1 cache with stall on RAW or WAW, at 75 MHz, tends to perform worse IME, than a 32K cache running at 50 MHz with no RAW/WAW stall.
Also, trying to increase MHz by increasing instruction latency in many cases was also not ideal for performance.
Granted, if I were to do things the "DEC Alpha" way, I probably could run stuff at 75MHz, but then would likely need the compiler to insert a bunch of strategic NOPs so that the program doesn't break.
For memory ordering, possibly, in my case a case could be made for an "order respecting DRAM cache" via the MMIO interface, say:
   F000_01000000..F000_3FFFFFFF
Could be defined to alias with the main RAM map, but with strictly sequential ordering for every memory access across all cores (at the expense of performance).
Where:
   0000_00000000..7FFF_FFFFFFFF: Virtual Address Space
   8000_00000000..BFFF_FFFFFFFF: Supervisor-Only Virtual Address Space
   C000_00000000..CFFF_FFFFFFFF: Physical Address Space, Default Caching
   D000_00000000..DFFF_FFFFFFFF: Physical Address Space, Volatile/NoCache
   E000_00000000..EFFF_FFFFFFFF: Reserved
   F000_00000000..FFFF_FFFFFFFF: MMIO Space
MMIO space is currently fully independent of RAM space.
However, at present:
   FFFF_F0000000..FFFF_FFFFFFFF: MMIO Space, as Used for MMIO devices.
So, in theory, remerging RAM IO space into MMIO Space would be possible (well, except that trying to access HW MMIO address ranges via RAM-space access would likely be disallowed).
Can note, MMU disabled:
   0000_00000000..0FFF_FFFFFFFF: Same as C000..CFFF space.
   1000_00000000..7FFF_FFFFFFFF: Invalid
...
Granted, current scheme does set a limit of 16TB of RAM.
   But, biggest FPGA boards I have only have 256MB, so, ...
And, current VA map within TestKern (from memory):
   0000_00000000..0000_00FFFFFF: NULL Space
   0000_01000000..0000_3FFFFFFF: RAM Range (Identity Mapped)
   0000_40000000..0000_BFFFFFFF: Direct Page Mapping (no swap)
   0001_00000000..3FFF_FFFFFFFF: Mapped to swapfile, Global
   4000_00000000..7FFF_FFFFFFFF: Process Local
Note that, within the RAM-range, the RAM will wrap around. The specifics of the wraparound are used to detect RAM size (this would set an effective limit at 512MB, after which no wraparound would be detected).
Specifics here would need to change if larger RAM sizes were supported.
Not sure how RAM size is detected with DIMM modules. IIRC, with PCs, it was more probe along linearly until one finds an address that no longer returns valid data (say, if one hits the 1GB mark, and gets back 000000 or FFFFFFF or similar, assume end of RAM at 1GB).
One does need to make sure caches (including L2 cache) are flushed during all this, as the caches doing their usual cache thing, may incorrectly detect larger RAM than actually exists.
...

- anton

Date Sujet#  Auteur
28 Oct 24 * Arm ldaxr / stxr loop question135jseigh
31 Oct 24 +- Re: Arm ldaxr / stxr loop question1MitchAlsup1
31 Oct 24 +- Re: Arm ldaxr / stxr loop question1Chris M. Thomasson
1 Nov 24 +* Re: Arm ldaxr / stxr loop question123aph
2 Nov 24 i`* Re: Arm ldaxr / stxr loop question122Chris M. Thomasson
8 Nov 24 i `* Re: Arm ldaxr / stxr loop question121Chris M. Thomasson
9 Nov 24 i  `* Re: Arm ldaxr / stxr loop question120Chris M. Thomasson
9 Nov 24 i   +* Re: Arm ldaxr / stxr loop question117Chris M. Thomasson
9 Nov 24 i   i+- Re: Arm ldaxr / stxr loop question1Chris M. Thomasson
11 Nov 24 i   i+* Re: Arm ldaxr / stxr loop question5MitchAlsup1
11 Nov 24 i   ii+- Re: Arm ldaxr / stxr loop question1Michael S
11 Nov 24 i   ii`* Re: Arm ldaxr / stxr loop question3jseigh
11 Nov 24 i   ii `* Re: Arm ldaxr / stxr loop question2Chris M. Thomasson
13 Nov 24 i   ii  `- Re: Arm ldaxr / stxr loop question1Chris M. Thomasson
11 Nov 24 i   i+- Re: Arm ldaxr / stxr loop question1Michael S
12 Nov 24 i   i+- Re: Arm ldaxr / stxr loop question1Chris M. Thomasson
12 Nov 24 i   i+* Re: Arm ldaxr / stxr loop question22aph
13 Nov 24 i   ii+* Re: Arm ldaxr / stxr loop question18Chris M. Thomasson
13 Nov 24 i   iii`* Re: Arm ldaxr / stxr loop question17aph
13 Nov 24 i   iii +* Re: Arm ldaxr / stxr loop question3jseigh
13 Nov 24 i   iii i`* Re: Arm ldaxr / stxr loop question2aph
13 Nov 24 i   iii i `- Re: Arm ldaxr / stxr loop question1Chris M. Thomasson
13 Nov 24 i   iii +- Re: Arm ldaxr / stxr loop question1MitchAlsup1
13 Nov 24 i   iii +* Re: Arm ldaxr / stxr loop question2Chris M. Thomasson
13 Nov 24 i   iii i`- Re: Arm ldaxr / stxr loop question1Chris M. Thomasson
13 Nov 24 i   iii +* Re: Arm ldaxr / stxr loop question2Chris M. Thomasson
13 Nov 24 i   iii i`- Re: Arm ldaxr / stxr loop question1Chris M. Thomasson
13 Nov 24 i   iii `* Re: Arm ldaxr / stxr loop question8Terje Mathisen
13 Nov 24 i   iii  +* Brilliance (was: Arm ldaxr / stxr loop question)4Anton Ertl
13 Nov 24 i   iii  i+- Re: Brilliance1BGB
14 Nov 24 i   iii  i`* Re: Brilliance2Terje Mathisen
17 Nov 24 i   iii  i `- Re: Brilliance1Thomas Koenig
13 Nov 24 i   iii  `* Re: Arm ldaxr / stxr loop question3aph
14 Nov 24 i   iii   `* Re: Arm ldaxr / stxr loop question2Terje Mathisen
15 Nov 24 i   iii    `- Re: Arm ldaxr / stxr loop question1Chris M. Thomasson
13 Nov 24 i   ii`* Re: Arm ldaxr / stxr loop question3BGB
13 Nov 24 i   ii `* Re: Arm ldaxr / stxr loop question2Chris M. Thomasson
13 Nov 24 i   ii  `- Re: Arm ldaxr / stxr loop question1Robert Finch
14 Nov 24 i   i`* Re: Arm ldaxr / stxr loop question86Kent Dickey
14 Nov 24 i   i `* Re: Arm ldaxr / stxr loop question85aph
15 Nov 24 i   i  +* Re: Arm ldaxr / stxr loop question81Chris M. Thomasson
15 Nov 24 i   i  i`* Re: Arm ldaxr / stxr loop question80aph
15 Nov 24 i   i  i +- Re: Arm ldaxr / stxr loop question1Chris M. Thomasson
15 Nov 24 i   i  i `* Memory ordering (was: Arm ldaxr / stxr loop question)78Anton Ertl
15 Nov 24 i   i  i  +* Re: Memory ordering44Chris M. Thomasson
15 Nov 24 i   i  i  i`* Re: Memory ordering43Michael S
15 Nov 24 i   i  i  i `* Re: Memory ordering42Chris M. Thomasson
16 Nov 24 i   i  i  i  `* Re: Memory ordering41Chris M. Thomasson
16 Nov 24 i   i  i  i   +- Re: Memory ordering1Chris M. Thomasson
17 Nov 24 i   i  i  i   `* Re: Memory ordering39jseigh
17 Nov 24 i   i  i  i    +* Re: Memory ordering33Anton Ertl
19 Nov 24 i   i  i  i    i`* Re: Memory ordering32Chris M. Thomasson
3 Dec 24 i   i  i  i    i `* Re: Memory ordering31Anton Ertl
3 Dec 24 i   i  i  i    i  `* Re: Memory ordering30jseigh
3 Dec 24 i   i  i  i    i   `* Re: Memory ordering29MitchAlsup1
4 Dec 24 i   i  i  i    i    +* Re: Memory ordering22Stefan Monnier
4 Dec 24 i   i  i  i    i    i+* Re: Memory ordering3MitchAlsup1
4 Dec 24 i   i  i  i    i    ii`* Re: Memory ordering2Stefan Monnier
4 Dec 24 i   i  i  i    i    ii `- Re: Memory ordering1MitchAlsup1
4 Dec 24 i   i  i  i    i    i`* Re: Memory ordering18jseigh
5 Dec 24 i   i  i  i    i    i `* Re: Memory ordering17Chris M. Thomasson
5 Dec 24 i   i  i  i    i    i  +* Re: Memory ordering8jseigh
16 Dec22:48 i   i  i  i    i    i  i`* Re: Memory ordering7Chris M. Thomasson
17 Dec13:33 i   i  i  i    i    i  i `* Re: Memory ordering6jseigh
17 Dec21:38 i   i  i  i    i    i  i  +- Re: Memory ordering1aph
17 Dec21:41 i   i  i  i    i    i  i  `* Re: Memory ordering4Chris M. Thomasson
17 Dec22:45 i   i  i  i    i    i  i   +- Re: Memory ordering1MitchAlsup1
18 Dec12:43 i   i  i  i    i    i  i   `* Re: Memory ordering2jseigh
19 Dec03:48 i   i  i  i    i    i  i    `- Re: Memory ordering1Chris M. Thomasson
19 Dec19:33 i   i  i  i    i    i  `* Re: Memory ordering8MitchAlsup1
19 Dec22:19 i   i  i  i    i    i   `* Re: Memory ordering7Chris M. Thomasson
20 Dec00:59 i   i  i  i    i    i    +* Re: Memory ordering5MitchAlsup1
20 Dec01:21 i   i  i  i    i    i    i+* Re: Memory ordering2Chris M. Thomasson
20 Dec01:25 i   i  i  i    i    i    ii`- Re: Memory ordering1Chris M. Thomasson
20 Dec01:48 i   i  i  i    i    i    i`* Re: Memory ordering2Chris M. Thomasson
20 Dec01:58 i   i  i  i    i    i    i `- Re: Memory ordering1Chris M. Thomasson
20 Dec21:17 i   i  i  i    i    i    `- Re: Memory ordering1Chris M. Thomasson
4 Dec 24 i   i  i  i    i    +- Re: Memory ordering1Chris M. Thomasson
4 Dec 24 i   i  i  i    i    +- Re: Memory ordering1MitchAlsup1
5 Dec 24 i   i  i  i    i    `* Re: Memory ordering4Tim Rentsch
6 Dec 24 i   i  i  i    i     +* Re: Memory ordering2Terje Mathisen
6 Dec 24 i   i  i  i    i     i`- Re: Memory ordering1Tim Rentsch
20 Dec06:08 i   i  i  i    i     `- Re: Memory ordering1Chris M. Thomasson
17 Nov 24 i   i  i  i    +* Re: Memory ordering2Chris M. Thomasson
19 Nov 24 i   i  i  i    i`- Re: Memory ordering1Chris M. Thomasson
18 Nov 24 i   i  i  i    +- Re: Memory ordering1aph
21 Nov 24 i   i  i  i    +- Re: Memory ordering1Chris M. Thomasson
21 Nov 24 i   i  i  i    `- Re: Memory ordering1Chris M. Thomasson
15 Nov 24 i   i  i  +* Re: Memory ordering (was: Arm ldaxr / stxr loop question)2Michael S
15 Nov 24 i   i  i  i`- Re: Memory ordering (was: Arm ldaxr / stxr loop question)1Anton Ertl
15 Nov 24 i   i  i  +* Re: Memory ordering28jseigh
15 Nov 24 i   i  i  i`* Re: Memory ordering27Anton Ertl
15 Nov 24 i   i  i  i +* Re: Memory ordering18Chris M. Thomasson
16 Nov 24 i   i  i  i i`* Re: Memory ordering17Anton Ertl
17 Nov 24 i   i  i  i i `* Re: Memory ordering16Chris M. Thomasson
17 Nov 24 i   i  i  i i  `* Re: Memory ordering15Anton Ertl
18 Nov 24 i   i  i  i i   `* Re: Memory ordering14Chris M. Thomasson
18 Nov 24 i   i  i  i i    `* Re: Memory ordering13Anton Ertl
19 Nov 24 i   i  i  i i     `* Re: Memory ordering12Chris M. Thomasson
19 Nov 24 i   i  i  i i      `* Re: Memory ordering11Chris M. Thomasson
26 Nov 24 i   i  i  i i       +* Re: Memory ordering4Chris M. Thomasson
3 Dec 24 i   i  i  i i       `* Re: Memory ordering6Anton Ertl
15 Nov 24 i   i  i  i +* Re: Memory ordering7BGB
17 Nov 24 i   i  i  i `- Re: Memory ordering1Tim Rentsch
16 Nov 24 i   i  i  +- Re: Memory ordering (was: Arm ldaxr / stxr loop question)1Anton Ertl
16 Nov 24 i   i  i  +- Re: Memory ordering (was: Arm ldaxr / stxr loop question)1Lawrence D'Oliveiro
18 Nov 24 i   i  i  `- Re: Memory ordering1aph
21 Nov 24 i   i  `* Re: Arm ldaxr / stxr loop question3Kent Dickey
9 Nov 24 i   `* Re: Arm ldaxr / stxr loop question2jseigh
8 Nov 24 +* Re: Arm ldaxr / stxr loop question8Lawrence D'Oliveiro
20 Dec10:11 `- Re: Arm ldaxr / stxr loop question1Chris M. Thomasson

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal