Sujet : Re: OT: Re: Sieve of Erastosthenes optimized to the max
De : tr.17687 (at) *nospam* z991.linuxsc.com (Tim Rentsch)
Groupes : comp.lang.c++Date : 26. Aug 2024, 20:47:56
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <86r0ab3w2b.fsf@linuxsc.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
User-Agent : Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Vir Campestris <
vir.campestris@invalid.invalid> writes:
On 16/08/2024 18:35, Bonita Montero wrote:
>
But basically I don't think it is a good idea to skip numbers
exept multiples of two. [...]
>
I've been running some experiments.
>
Skipping evens only is nice and simple on computation; that's
good.
>
Skipping something else requires table lookups. [...]
It doesn't have to. One point of the code I've been posting is
that table lookups, including mask productions, are eliminated;
everything gets optimized away so what's left is all compile-time
constants.
I'm happy with my algorithm:
- Work out a bitmask for some small primes, only as far as is
needed to get repeats (the repeat length is the product of the
primes)
- Work out bitmasks for the next half a dozen. Each of those will
have a repeat the length of the prime.
- Combine those together into the output bitmap. That requires
fewer operations than doing each one into the output bitmap
- Process the larger primes. Do the output bitmap in blocks for
cache friendliness
My experience:
* Special casing primes between 7 and 29 helps (and because it's
the first step the results can initialize the map);
* Special purpose code where primes are taken in pairs, for the
set of primes more than 29 and less than 360, with each pair
being folded into the initialized map, helps (incidentally,
that is 31 pairs, or 62 primes total);
* The combination of those two steps gives a saving of about 35%;
* Doing striping (the term I use for working in cache-sized
blocks) seems not to help much, so I don't use striping except
incidentally in the prime-pairs step;
* Forcing the function for the main second-level loop to be
inlined saves another 15% or so;
* I haven't tried to measure how the speed-up tricks scale with
overall number of primes to calculate. My guess is that they
are close to linear but I don't have any actual measurements.