Liste des Groupes | Revenir à c arch |
On 7/29/2024 1:55 PM, Chris M. Thomasson wrote:Well, ideally there would be a build command (macro or something) to try to "detect" the system its being compiled for and use the "appropriate" code... Ahhh, this was in the past. C++11 has atomics and membars. Wrt relaxed well, that is pretty nice wrt being "portable" so to speak. C++ should take care of the membars wrt using the right instructions on a target system for us, right? wrt using its API's for atomics, threads, membars, ect... Fwiw, check out my old code:On 7/29/2024 12:25 AM, BGB wrote:The issue is that if one takes some kinds of naive lock-free algorithms (say, written for x86 or similar), and throw them unchanged on something running a weak model, they will not work correctly.On 7/28/2024 10:32 PM, Chris M. Thomasson wrote:>On 7/26/2024 10:00 AM, Anton Ertl wrote:>"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:>On 7/25/2024 1:09 PM, BGB wrote:>At least with a weak model, software knows that if it doesn't go through
the rituals, the memory will be stale.
There is no guarantee of staleness, only a lack of stronger ordering
guarantees.
>The weak model is ideal for me. I know how to program for it>
And the fact that this model is so hard to use that few others know
how to program for it make it ideal for you.
>and it's more efficient>
That depends on the hardware.
>
Yes, the Alpha 21164 with its imprecise exceptions was "more
efficient" than other hardware for a while, then the Pentium Pro came
along and gave us precise exceptions and more efficiency. And
eventually the Alpha people learned the trick, too, and 21264 provided
precise exceptions (although they did not admit this) and more
efficieny.
>
Similarly, I expect that hardware that is designed for good TSO or
sequential consistency performance will run faster on code written for
this model than code written for weakly consistent hardware will run
on that hardware. That's because software written for weakly
consistent hardware often has to insert barriers or atomic operations
just in case, and these operations are slow on hardware optimized for
weak consistency.
>
By contrast, one can design hardware for strong ordering such that the
slowness occurs only in those cases when actual (not potential)
communication between the cores happens, i.e., much less frequently.
>and sometimes use cases do not care if they encounter "stale" data.>
Great. Unless these "sometimes" cases are more often than the cases
where you perform some atomic operation or barrier because of
potential, but not actual communication between cores, the weak model
is still slower than a well-implemented strong model.
A strong model? You mean I don't have to use any memory barriers at all? Tell that to SPARC in RMO mode... How strong? Even the x86 requires a membar when a store followed by a load to another location shall be respected wrt order. Store-Load. #StoreLoad over on SPARC. ;^)
>
If you can force everything to be #StoreLoad (*) and make it faster than a handcrafted algo on a very weak memory system, well, hats off! I thought it was easier for a HW guy to implement weak consistency? At the cost of the increased complexity wrt programming the sucker! ;^)
>
Programming for a weak model isn't that hard...
>
Well, unless the program is built around a "naive lock free" strategy (where the threads manipulate members in a data-structure or similar and assume that the other threads will see the updates in a more-or-less consistent way).
lock/wait-free algorithms are very nice. Yes they can be fairly hard, but can be done for sure; stable and working in 100% correct order. The good ones are hard to beat using all locking logic. Try to beat RCU using a read write lock? I have some interesting algorithms that work like a charm.
>
Previously, this could be made to work using "knocking" by adding extra memory loads to the mix.You need to get it right. A memory barrier bug is a devious little shit!
At present (with the associative "VCA cache"), one would also need to also use "INVDC" instructions to flush cache lines.[...]
Les messages affichés proviennent d'usenet.