Re: OT: Re: Sieve of Erastosthenes optimized to the max

Liste des GroupesRevenir à cl c++ 
Sujet : Re: OT: Re: Sieve of Erastosthenes optimized to the max
De : vir.campestris (at) *nospam* invalid.invalid (Vir Campestris)
Groupes : comp.lang.c++
Date : 15. Aug 2024, 18:52:10
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v9lbns$11alj$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
User-Agent : Mozilla Thunderbird
I'm going to come back to this slightly earlier version, and add comments inline.
On 10/08/2024 15:07, Tim Rentsch wrote:
Vir Campestris <vir.campestris@invalid.invalid> writes:
 
On 23/07/2024 15:34, Tim Rentsch wrote:
>
Vir Campestris <vir.campestris@invalid.invalid> writes:
>
[...]
>
I was hoping for some feedback, even if very brief,
before I continue.
>
Sorry, I was slow.  It's really odd, I'm retired, and I ought to
have lots of time for things like this... and yet I seem to be
busy all the time.
 No worries, I understand.
 I'm sorry to be so long in responding.  Among other things
there was some sort of snafu with my news server, resulting
in hundreds of lost and/or duplicated messages, and that
took a while to sort out.
 
I think that was Eternal September - he had an outage.

On 20/07/2024 15:41, Tim Rentsch wrote:
>
Vir Campestris <vir.campestris@invalid.invalid> writes:
>
On 02/07/2024 07:20, Tim Rentsch wrote:
[...]
Both divide and mod can be simplified if we use a different
representation for numbers.  The idea is to separate the
number into a mod 30 part and divided by 30 part.  An
arbitrary number N can be split into an ordered pair
>
     N/30 , N%30
>
stored in, let's say, an 8 byte quantity with seven bytes
for N/30 and one byte for N%30.  Let me call these two
parts i and a for one number N, and j and b for a second
number M.  Then two form N*M we need to find
>
I suspect the cost of extracting the divided value from the 7
bytes will be prohibitive.  You may find it's best to have one
table for the N/30 parts (big entries, word aligned) and another
for the N%30 part (small entries, byte aligned).  BICBW.
 Two tables, absolutely.  What does BICBW stand for?
 
     (i*30+a) * (j*30+b)
>
which is
>
     (i*30*j*30) + (i*30*b) + (j*30*a) + a*b
>
Of course we want to take the product and express it in
the form (product/30, product%30), which can be done by
>
     30*i*j + i*b + j*a + (a*b)/30,   (a*b)%30
>
The two numbers a and b are both under 30, so a*b/30 and
a*b%30 can be done by lookup in a small array.
>
     30*i*j + i*b + j*a + divide_30[a][b],   modulo_30[a][b]
>
Now all operations can be done using just multiplies and
table lookups.  Dividing by 30 is just taking the first
component;  remainder 30 is just taking the second component.
>
One further refinement:  for numbers relatively prime to 2,
3, and 5, there are only 8 possibles, so the modulo 30
component can be reduced to just three bits.  That makes the
tables smaller, at a cost of needing a table lookup to find
and 'a' or 'b' part.
>
Does this all make sense?
>
Yes, it does.  It's not a direction I was thinking of at all,
although I too had table lookups in mind.
>
What I had considered is an extrapolation from my original code.
 I read the comments below in conjunction with getting a better
understanding of your original code, which is part of what took
me so long before responding.  Before I get to the stuff below
(which I think I will save for a second reply), here is some code
expanding on what I said above.  It should compile cleanly as is.
However, its purpose is to illustrate what I said above.
    typedef unsigned long NN, Index, Count, Mask;
   typedef unsigned char Byte;
 
I've been using uint64_t, not unsigned long. When I started C unsigned long was 32 bits; I'm never quite sure what you'll get.

   static  void    remove_multiples( Index, Count, Index );
    static  Byte    bytes[1000000000];
   static  NN      mods[] = { 1, 7, 11, 13, 17, 19, 23, 29, };
    Count
   sieve( NN limit ){
       Count r = 0;
       Index i;
       Count shift;
       Index i_limit = (limit + 29)/30;
        bytes[0] = 1;
       for(  i = 0;  i < i_limit;  i++  ){
           Byte bits = bytes[i];
           for(  shift = 0;  shift < 8;  shift++  ){
               NN p = i*30 + mods[shift];
               if(  bits & 1<<shift  )  continue;
               if(  p >= limit  )  continue;
               r++;        // count how many primes found
                           // could print the prime 'p' here if desired
               if(  p*p < limit  )  remove_multiples( i, shift, i_limit );
           }
       }
        return  r;
   }
    static  Byte    shift_for[30] = {
       9,0,9,9,9,9,  9,1,9,9,9,2,  9,3,9,9,9,4,  9,5,9,9,9,6,  9,9,9,9,9,7,
   };
    static  Byte  ab_div_30[8][8] = {
    {  1*1/30, 1*7/30, 1*11/30, 1*13/30, 1*17/30, 1*19/30, 1*23/30, 1*29/30 },
    {  7*1/30, 7*7/30, 7*11/30, 7*13/30, 7*17/30, 7*19/30, 7*23/30, 7*29/30 },
    { 11*1/30,11*7/30,11*11/30,11*13/30,11*17/30,11*19/30,11*23/30,11*29/30 },
    { 13*1/30,13*7/30,13*11/30,13*13/30,13*17/30,13*19/30,13*23/30,13*29/30 },
    { 17*1/30,17*7/30,17*11/30,17*13/30,17*17/30,17*19/30,17*23/30,17*29/30 },
    { 19*1/30,19*7/30,19*11/30,19*13/30,19*17/30,19*19/30,19*23/30,19*29/30 },
    { 23*1/30,23*7/30,23*11/30,23*13/30,23*17/30,23*19/30,23*23/30,23*29/30 },
    { 29*1/30,29*7/30,29*11/30,29*13/30,29*17/30,29*19/30,29*23/30,29*29/30 },
   };
    static  Byte  ab_mod_30[8][8] = {
    {  1*1%30, 1*7%30, 1*11%30, 1*13%30, 1*17%30, 1*19%30, 1*23%30, 1*29%30 },
    {  7*1%30, 7*7%30, 7*11%30, 7*13%30, 7*17%30, 7*19%30, 7*23%30, 7*29%30 },
    { 11*1%30,11*7%30,11*11%30,11*13%30,11*17%30,11*19%30,11*23%30,11*29%30 },
    { 13*1%30,13*7%30,13*11%30,13*13%30,13*17%30,13*19%30,13*23%30,13*29%30 },
    { 17*1%30,17*7%30,17*11%30,17*13%30,17*17%30,17*19%30,17*23%30,17*29%30 },
    { 19*1%30,19*7%30,19*11%30,19*13%30,19*17%30,19*19%30,19*23%30,19*29%30 },
    { 23*1%30,23*7%30,23*11%30,23*13%30,23*17%30,23*19%30,23*23%30,23*29%30 },
    { 29*1%30,29*7%30,29*11%30,29*13%30,29*17%30,29*19%30,29*23%30,29*29%30 },
   };
 
constexpr on those tables might give the optimiser something else to play with.

   void
   remove_multiples( Index i, Count shift_a, Index i_limit ){
So i is (prime/30) and shift_a is the bit in the mask byte - corresponding to (prime%30)?

       Index j, k;
       NN a = mods[shift_a], b, c;
       Count shift;
        for(  shift = shift_a;  shift < 8;  shift++  ){
           j = i;
           b = mods[shift];
           k = i*j*30 + i*b + j*a + ab_div_30[ shift_a ][ shift ];
           c = ab_mod_30[ shift_a ][ shift ];
           if(  k >= i_limit  )  continue;
           bytes[k] |= 1 << shift_for[c];
       }
        for(  j = i+1;  (30UL*i+a)*(30UL*j+1) < i_limit * 30;  j++  ){
           for(  shift = 0;  shift < 8;  shift++  ){
               b = mods[shift];
               k = i*j*30 + i*b + j*a + ab_div_30[ shift_a ][ shift ];
               c = ab_mod_30[ shift_a ][ shift ];
               if(  k >= i_limit  )  continue;
               bytes[k] |= 1 << shift_for[c];
           }
       }
   }
 In both sieve() and remove_multiples(), there is an inner loop
that ranges over the possible shift values.  (There is also an
unnested loop in remove_multiples() that ranges over a subset of
shift values, but for now please ignore that.)
 Since these loops always perform exactly eight iterations, they
can be completely unrolled without expanding the code too much.
Here is a revised version of the outer loop in sieve() (actually
the code below is not yet ready for compilation, because some
things have changed):
      for(  i = 0;  i < i_limit;  i++  ){
         Byte bits = bytes[i] ^ 0xFF;
         if(  bits & 0001  )  remove_multiples( i_limit, i,  1 ),  r++;
         if(  bits & 0002  )  remove_multiples( i_limit, i,  7 ),  r++;
         if(  bits & 0004  )  remove_multiples( i_limit, i, 11 ),  r++;
         if(  bits & 0010  )  remove_multiples( i_limit, i, 13 ),  r++;
         if(  bits & 0020  )  remove_multiples( i_limit, i, 17 ),  r++;
         if(  bits & 0040  )  remove_multiples( i_limit, i, 19 ),  r++;
         if(  bits & 0100  )  remove_multiples( i_limit, i, 23 ),  r++;
         if(  bits & 0200  )  remove_multiples( i_limit, i, 29 ),  r++;
     }
Before unrolling always benchmark. (Modern compilers are pretty smart, and sometimes will unroll for you. Modern processors with branch prediction are pretty good about working out when to jump and when not to. And with speculative execution they sometimes run both paths from a branch... I could go on!)
 We changed the interface to remove_multiples() to pass along the
index value 'i' and also the value for 'a' rather than the value
calculated for the prime 'p'.  Since the loop has been unrolled,
the values for the 'a' arguments are all constants (and of course
all the mask values are constants).
 After making a similar change to the remove_multiples() loop, the
values for the 'b's are likewise all constants.  If the optimizer
is smart enough, all the 'a*b/30' values and 'a*b%30' values turn
into constants.  Also the expressions 'i*b' and 'j*a' all turn
into multiplication by a constant.  Furthermore, to determine the
appropriate bit value to 'or' in, an eight-choice conditional
expression that tests each of the eight possible residues can
similarly be optimized away so the masks to 'or' in are all
constants.  Do you see how that all works?
 The point of all this unrolling is to get rid of all the shifting
and table lookups.  The only variables involved are the loop
variables 'i' and 'j';  everything else has been turned into a
constant.  Granted, the generated code has been expanded a great
deal (basically a factor of 8*8 == 64), but the pieces are now
much simpler so the result is still manageable.
 
And now a few minutes later as I read that I understand the reason for the unrolling better. But my point still stands - benchmark it.
Of course the decision on whether to unroll or not might depend on the target CPU!

I wanted to be sure you understand how the original idea is
transformed into a better performing version before posting
something more refined.  I get that your time is at a premium,
still you might trying going through the exercise of unrolling
the loops and writing a set_product() routine (or macro) to get a
sense of where I'm going here.
 Hope to send response to send part sometime soon.
You seem to have spotted a pattern in the steps that I didn't - I looked at the mask values, and decided it wasn't worth working out which of the 8 possible sequences to use - remembering which was too expensive. But it's also (I think, I haven't coded it) to work out whether you need to step 2, 4, 8 times the prime. And that would save more memory.
Another thought BTW - whether it's worth storing with a larger modulo - say 2*3*5*7 - not just 2*3*5 is a different decision from deciding whether the steps between primes should take that into consideration.
Andy

Date Sujet#  Auteur
23 Mar 24 * Re: Sieve of Erastosthenes optimized to the max68Chris M. Thomasson
23 Mar 24 `* Re: Sieve of Erastosthenes optimized to the max67Bonita Montero
23 Mar 24  `* Re: Sieve of Erastosthenes optimized to the max66Chris M. Thomasson
24 Mar 24   `* Re: Sieve of Erastosthenes optimized to the max65Bonita Montero
24 Mar 24    `* Re: Sieve of Erastosthenes optimized to the max64Chris M. Thomasson
24 Mar 24     `* Re: Sieve of Erastosthenes optimized to the max63Bonita Montero
24 Mar 24      `* Re: Sieve of Erastosthenes optimized to the max62Chris M. Thomasson
16 May 24       `* Re: Sieve of Erastosthenes optimized to the max61Vir Campestris
16 May 24        +- Re: Sieve of Erastosthenes optimized to the max1Ben Bacarisse
22 May 24        `* Re: Sieve of Erastosthenes optimized to the max59Tim Rentsch
30 May 24         `* Re: Sieve of Erastosthenes optimized to the max58Vir Campestris
30 May 24          +- Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
30 May 24          +* Re: Sieve of Erastosthenes optimized to the max3Paavo Helde
31 May 24          i`* Re: Sieve of Erastosthenes optimized to the max2Bonita Montero
31 May 24          i `- Re: Sieve of Erastosthenes optimized to the max1Paavo Helde
31 May 24          `* Re: Sieve of Erastosthenes optimized to the max53Tim Rentsch
1 Jun 24           `* Re: Sieve of Erastosthenes optimized to the max52Vir Campestris
2 Jun 24            +- Re: Sieve of Erastosthenes optimized to the max1Richard Damon
2 Jun 24            `* Re: Sieve of Erastosthenes optimized to the max50Tim Rentsch
3 Jun 24             `* Re: Sieve of Erastosthenes optimized to the max49Tim Rentsch
18 Jun 24              `* Re: Sieve of Erastosthenes optimized to the max48Vir Campestris
19 Jun 24               `* Re: Sieve of Erastosthenes optimized to the max47Tim Rentsch
30 Jun 24                `* Re: Sieve of Erastosthenes optimized to the max46Vir Campestris
2 Jul 24                 `* Re: Sieve of Erastosthenes optimized to the max45Tim Rentsch
2 Jul 24                  +- Re: Sieve of Erastosthenes optimized to the max1Vir Campestris
3 Jul 24                  `* Re: Sieve of Erastosthenes optimized to the max43Vir Campestris
15 Jul 24                   +- Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
20 Jul 24                   +* Re: Sieve of Erastosthenes optimized to the max40Tim Rentsch
25 Jul 24                   i`* OT: Re: Sieve of Erastosthenes optimized to the max39Vir Campestris
10 Aug 24                   i +* Re: OT: Re: Sieve of Erastosthenes optimized to the max36Tim Rentsch
12 Aug 24                   i i+* Re: OT: Re: Sieve of Erastosthenes optimized to the max2Vir Campestris
16 Aug 24                   i ii`- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
15 Aug 24                   i i`* Re: OT: Re: Sieve of Erastosthenes optimized to the max33Vir Campestris
16 Aug 24                   i i `* Re: OT: Re: Sieve of Erastosthenes optimized to the max32Tim Rentsch
16 Aug 24                   i i  +* Re: OT: Re: Sieve of Erastosthenes optimized to the max30Bonita Montero
16 Aug 24                   i i  i+- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
19 Aug 24                   i i  i+* Re: OT: Re: Sieve of Erastosthenes optimized to the max12Vir Campestris
20 Aug 24                   i i  ii`* Re: OT: Re: Sieve of Erastosthenes optimized to the max11Bonita Montero
20 Aug 24                   i i  ii `* Re: OT: Re: Sieve of Erastosthenes optimized to the max10Bonita Montero
20 Aug 24                   i i  ii  +- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
20 Aug 24                   i i  ii  `* Re: OT: Re: Sieve of Erastosthenes optimized to the max8Vir Campestris
20 Aug 24                   i i  ii   +- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
26 Aug 24                   i i  ii   `* Re: OT: Re: Sieve of Erastosthenes optimized to the max6Tim Rentsch
27 Aug 24                   i i  ii    `* Re: OT: Re: Sieve of Erastosthenes optimized to the max5Bonita Montero
1 Sep 24                   i i  ii     `* Re: OT: Re: Sieve of Erastosthenes optimized to the max4Vir Campestris
2 Sep 24                   i i  ii      `* Re: OT: Re: Sieve of Erastosthenes optimized to the max3Tim Rentsch
2 Sep 24                   i i  ii       +- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
3 Sep 24                   i i  ii       `- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Vir Campestris
19 Aug 24                   i i  i+* Re: OT: Re: Sieve of Erastosthenes optimized to the max12Vir Campestris
20 Aug 24                   i i  ii+* Re: OT: Re: Sieve of Erastosthenes optimized to the max4red floyd
20 Aug 24                   i i  iii+* Re: OT: Re: Sieve of Erastosthenes optimized to the max2Vir Campestris
26 Aug 24                   i i  iiii`- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
26 Aug 24                   i i  iii`- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
20 Aug 24                   i i  ii+* Re: OT: Re: Sieve of Erastosthenes optimized to the max3Bonita Montero
20 Aug 24                   i i  iii+- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
20 Aug 24                   i i  iii`- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
20 Aug 24                   i i  ii`* Re: OT: Re: Sieve of Erastosthenes optimized to the max4Bonita Montero
22 Aug 24                   i i  ii `* Re: OT: Re: Sieve of Erastosthenes optimized to the max3Vir Campestris
22 Aug 24                   i i  ii  `* Re: OT: Re: Sieve of Erastosthenes optimized to the max2Bonita Montero
22 Aug 24                   i i  ii   `- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Vir Campestris
22 Aug 24                   i i  i`* Re: OT: Re: Sieve of Erastosthenes optimized to the max4Vir Campestris
23 Aug 24                   i i  i +* Re: OT: Re: Sieve of Erastosthenes optimized to the max2red floyd
26 Aug 24                   i i  i i`- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
26 Aug 24                   i i  i `- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
19 Aug 24                   i i  `- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
11 Aug 24                   i +- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
11 Aug 24                   i `- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
23 Jul 24                   `- Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal