Re: OT: Re: Sieve of Erastosthenes optimized to the max

Liste des GroupesRevenir à cl c++ 
Sujet : Re: OT: Re: Sieve of Erastosthenes optimized to the max
De : tr.17687 (at) *nospam* z991.linuxsc.com (Tim Rentsch)
Groupes : comp.lang.c++
Date : 16. Aug 2024, 17:40:37
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <86r0aojx1m.fsf@linuxsc.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
User-Agent : Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Vir Campestris <vir.campestris@invalid.invalid> writes:

I'm going to come back to this slightly earlier version, and add
comments inline.

Okay good deal.  I'm taking out some parts that look like they
are no longer relevant.

On 10/08/2024 15:07, Tim Rentsch wrote:
>
Vir Campestris <vir.campestris@invalid.invalid> writes:
>
On 20/07/2024 15:41, Tim Rentsch wrote:
>
Vir Campestris <vir.campestris@invalid.invalid> writes:
>
On 02/07/2024 07:20, Tim Rentsch wrote:
>
[...]
>
Both divide and mod can be simplified if we use a different
representation for numbers.  The idea is to separate the
number into a mod 30 part and divided by 30 part.  An
arbitrary number N can be split into an ordered pair
>
     N/30 , N%30
>
stored in, let's say, an 8 byte quantity with seven bytes
for N/30 and one byte for N%30.  Let me call these two
parts i and a for one number N, and j and b for a second
number M.  Then two form N*M we need to find
>
     (i*30+a) * (j*30+b)
>
which is
>
     (i*30*j*30) + (i*30*b) + (j*30*a) + a*b
>
Of course we want to take the product and express it in
the form (product/30, product%30), which can be done by
>
     30*i*j + i*b + j*a + (a*b)/30,   (a*b)%30
>
The two numbers a and b are both under 30, so a*b/30 and
a*b%30 can be done by lookup in a small array.
>
     30*i*j + i*b + j*a + divide_30[a][b],   modulo_30[a][b]
>
Now all operations can be done using just multiplies and
table lookups.  Dividing by 30 is just taking the first
component;  remainder 30 is just taking the second component.
>
One further refinement:  for numbers relatively prime to 2,
3, and 5, there are only 8 possibles, so the modulo 30
component can be reduced to just three bits.  That makes the
tables smaller, at a cost of needing a table lookup to find
and 'a' or 'b' part.
>
Does this all make sense?
>
Yes, it does.  It's not a direction I was thinking of at all,
although I too had table lookups in mind.
>
What I had considered is an extrapolation from my original code.
>
I read the comments below in conjunction with getting a better
understanding of your original code, which is part of what took
me so long before responding.  Before I get to the stuff below
(which I think I will save for a second reply), here is some code
expanding on what I said above.  It should compile cleanly as is.
However, its purpose is to illustrate what I said above.
>
   typedef unsigned long NN, Index, Count, Mask;
   typedef unsigned char Byte;
>
I've been using uint64_t, not unsigned long.  When I started C
unsigned long was 32 bits;  I'm never quite sure what you'll get.

I'm coming around to the point of view that size_t is the best
choice for these types.  In any case though the point of using a
typedef is so the more meaningful names can be used but still
changed easily where they are defined.

   static  void    remove_multiples( Index, Count, Index );
>
   static  Byte    bytes[1000000000];
   static  NN      mods[] = { 1, 7, 11, 13, 17, 19, 23, 29, };
>
   Count
   sieve( NN limit ){
       Count r = 0;
       Index i;
       Count shift;
       Index i_limit = (limit + 29)/30;
>
       bytes[0] = 1;
       for(  i = 0;  i < i_limit;  i++  ){
           Byte bits = bytes[i];
           for(  shift = 0;  shift < 8;  shift++  ){
               NN p = i*30 + mods[shift];
               if(  bits & 1<<shift  )  continue;
               if(  p >= limit  )  continue;
               r++;        // count how many primes found
                           // could print the prime 'p' here if desired
               if(  p*p < limit  )  remove_multiples( i, shift, i_limit );
           }
       }
>
       return  r;
   }
>
   static  Byte    shift_for[30] = {
       9,0,9,9,9,9,  9,1,9,9,9,2,  9,3,9,9,9,4,  9,5,9,9,9,6,  9,9,9,9,9,7,
   };
>
 static  Byte  ab_div_30[8][8] = {
  {  1*1/30, 1*7/30, 1*11/30, 1*13/30, 1*17/30, 1*19/30, 1*23/30, 1*29/30 },
  {  7*1/30, 7*7/30, 7*11/30, 7*13/30, 7*17/30, 7*19/30, 7*23/30, 7*29/30 },
  { 11*1/30,11*7/30,11*11/30,11*13/30,11*17/30,11*19/30,11*23/30,11*29/30 },
  { 13*1/30,13*7/30,13*11/30,13*13/30,13*17/30,13*19/30,13*23/30,13*29/30 },
  { 17*1/30,17*7/30,17*11/30,17*13/30,17*17/30,17*19/30,17*23/30,17*29/30 },
  { 19*1/30,19*7/30,19*11/30,19*13/30,19*17/30,19*19/30,19*23/30,19*29/30 },
  { 23*1/30,23*7/30,23*11/30,23*13/30,23*17/30,23*19/30,23*23/30,23*29/30 },
  { 29*1/30,29*7/30,29*11/30,29*13/30,29*17/30,29*19/30,29*23/30,29*29/30 },
 };
>
 static  Byte  ab_mod_30[8][8] = {
  {  1*1%30, 1*7%30, 1*11%30, 1*13%30, 1*17%30, 1*19%30, 1*23%30, 1*29%30 },
  {  7*1%30, 7*7%30, 7*11%30, 7*13%30, 7*17%30, 7*19%30, 7*23%30, 7*29%30 },
  { 11*1%30,11*7%30,11*11%30,11*13%30,11*17%30,11*19%30,11*23%30,11*29%30 },
  { 13*1%30,13*7%30,13*11%30,13*13%30,13*17%30,13*19%30,13*23%30,13*29%30 },
  { 17*1%30,17*7%30,17*11%30,17*13%30,17*17%30,17*19%30,17*23%30,17*29%30 },
  { 19*1%30,19*7%30,19*11%30,19*13%30,19*17%30,19*19%30,19*23%30,19*29%30 },
  { 23*1%30,23*7%30,23*11%30,23*13%30,23*17%30,23*19%30,23*23%30,23*29%30 },
  { 29*1%30,29*7%30,29*11%30,29*13%30,29*17%30,29*19%30,29*23%30,29*29%30 },
 };
>
constexpr on those tables might give the optimiser something else to
play with.

All of these tables are going to disappear in the next revision.

   void
   remove_multiples( Index i, Count shift_a, Index i_limit ){
>
So i is (prime/30) and shift_a is the bit in the mask byte -
corresponding to (prime%30)?

Right.  The value 'a' is 'mods[shift_a]', and the value 'shift_a' is
'shift_for[a]'.  Again though we will be dispensing with shift_for[]
and put in code that can be compiled to give constant values.

       Index j, k;
       NN a = mods[shift_a], b, c;
       Count shift;
>
       for(  shift = shift_a;  shift < 8;  shift++  ){
           j = i;
           b = mods[shift];
           k = i*j*30 + i*b + j*a + ab_div_30[ shift_a ][ shift ];
           c = ab_mod_30[ shift_a ][ shift ];
           if(  k >= i_limit  )  continue;
           bytes[k] |= 1 << shift_for[c];
       }
>
       for(  j = i+1;  (30UL*i+a)*(30UL*j+1) < i_limit * 30;  j++  ){
           for(  shift = 0;  shift < 8;  shift++  ){
               b = mods[shift];
               k = i*j*30 + i*b + j*a + ab_div_30[ shift_a ][ shift ];
               c = ab_mod_30[ shift_a ][ shift ];
               if(  k >= i_limit  )  continue;
               bytes[k] |= 1 << shift_for[c];
           }
       }
   }
>
In both sieve() and remove_multiples(), there is an inner loop
that ranges over the possible shift values.  (There is also an
unnested loop in remove_multiples() that ranges over a subset of
shift values, but for now please ignore that.)
>
Since these loops always perform exactly eight iterations, they
can be completely unrolled without expanding the code too much.
Here is a revised version of the outer loop in sieve() (actually
the code below is not yet ready for compilation, because some
things have changed):
>
     for(  i = 0;  i < i_limit;  i++  ){
         Byte bits = bytes[i] ^ 0xFF;
         if(  bits & 0001  )  remove_multiples( i_limit, i,  1 ),  r++;
         if(  bits & 0002  )  remove_multiples( i_limit, i,  7 ),  r++;
         if(  bits & 0004  )  remove_multiples( i_limit, i, 11 ),  r++;
         if(  bits & 0010  )  remove_multiples( i_limit, i, 13 ),  r++;
         if(  bits & 0020  )  remove_multiples( i_limit, i, 17 ),  r++;
         if(  bits & 0040  )  remove_multiples( i_limit, i, 19 ),  r++;
         if(  bits & 0100  )  remove_multiples( i_limit, i, 23 ),  r++;
         if(  bits & 0200  )  remove_multiples( i_limit, i, 29 ),  r++;
     }
>
Before unrolling always benchmark.  (Modern compilers are pretty smart,
and sometimes will unroll for you.  Modern processors with branch
prediction are pretty good about working out when to jump and when not
to.  And with speculative execution they sometimes run both paths from
a branch... I could go on!)

I understand the advice.  Keep in mind though that the code I'm
showing (and also true in the next version) is presented as a way of
explaining how I got to where I got.  I didn't start with the loop
version and unroll it;  everything was unrolled right from the get
go.  And there are more revisions to come, even after the next one
(to be shown below).

We changed the interface to remove_multiples() to pass along the
index value 'i' and also the value for 'a' rather than the value
calculated for the prime 'p'.  Since the loop has been unrolled,
the values for the 'a' arguments are all constants (and of course
all the mask values are constants).
>
After making a similar change to the remove_multiples() loop, the
values for the 'b's are likewise all constants.  If the optimizer
is smart enough, all the 'a*b/30' values and 'a*b%30' values turn
into constants.  Also the expressions 'i*b' and 'j*a' all turn
into multiplication by a constant.  Furthermore, to determine the
appropriate bit value to 'or' in, an eight-choice conditional
expression that tests each of the eight possible residues can
similarly be optimized away so the masks to 'or' in are all
constants.  Do you see how that all works?
>
The point of all this unrolling is to get rid of all the shifting
and table lookups.  The only variables involved are the loop
variables 'i' and 'j';  everything else has been turned into a
constant.  Granted, the generated code has been expanded a great
deal (basically a factor of 8*8 == 64), but the pieces are now
much simpler so the result is still manageable.
>
And now a few minutes later as I read that I understand the reason
for the unrolling better.  But my point still stands - benchmark
it.
>
Of course the decision on whether to unroll or not might depend on
the target CPU!

For this approach to work it's crucial to unroll completely to make
sure all the divides and remainders, and also the mask choices, get
optimized out.  I suppose it's possible that some architecture out
there could be faster without all constant folding, but it seems
unlikely, and in any case I made an early decision to write the code
using this approach, which has worked out well.

I wanted to be sure you understand how the original idea is
transformed into a better performing version before posting
something more refined.  I get that your time is at a premium,
still you might trying going through the exercise of unrolling
the loops and writing a set_product() routine (or macro) to get a
sense of where I'm going here.
>
Hope to send response to send part sometime soon.
>
You seem to have spotted a pattern in the steps that I didn't - I
looked at the mask values, and decided it wasn't worth working out
which of the 8 possible sequences to use - remembering which was too
expensive.  But it's also (I think, I haven't coded it) to work out
whether you need to step 2, 4, 8 times the prime.  And that would save
more memory.

The possible steps for mod30 are 2 or 4 or 6 times the prime.

Another thought BTW - whether it's worth storing with a larger modulo
- say 2*3*5*7 - not just 2*3*5 is a different decision from deciding
whether the steps between primes should take that into consideration.

My guess is that the possible steps are still just 2 or 4 or 6
(times the prime), but I haven't checked that.

Now here is revised code where the unrolling has been done, and also
uses optimizable mask production:

void
mod30_sieve_x( Index N, NN limit, Index i_limit ){
  Index  i;

    bytes[0] = 1;
    for(  i = 0;  i < i_limit;  i++  ){
Bits cbits = bytes[i];
Bits pbits = cbits ^ -1;
if(  pbits & 0001  )  remove_multiples_x( N, i,  1, pbits );
if(  pbits & 0002  )  remove_multiples_x( N, i,  7, pbits );
if(  pbits & 0004  )  remove_multiples_x( N, i, 11, pbits );
if(  pbits & 0010  )  remove_multiples_x( N, i, 13, pbits );
if(  pbits & 0020  )  remove_multiples_x( N, i, 17, pbits );
if(  pbits & 0040  )  remove_multiples_x( N, i, 19, pbits );
if(  pbits & 0100  )  remove_multiples_x( N, i, 23, pbits );
if(  pbits & 0200  )  remove_multiples_x( N, i, 29, pbits );
    }
}


#define set_product_x( i, a, j, b ) ( \
    bytes[ i*j*30 + i*b + j*a + a*b/30 ] |= BIT_FOR( (a*b%30) ) \
)

#define BIT_FOR( m ) (      \
  m ==  1 ? 0001  :  m ==  7 ? 0002  :  m == 11 ? 0004  :  m == 13 ? 0010  : \
  m == 17 ? 0020  :  m == 19 ? 0040  :  m == 23 ? 0100  :  m == 29 ? 0200  : \
  0      \
)

void
remove_multiples_x( Index N, Index i, NN a, Bits ibits ){
  NN p =  30*i + a;
  Index j;
  Index jn =  (N*30/p + 29)/30;

    switch(  a  ){
      case  1: set_product_x( i, a, i,  1 );
      case  7: set_product_x( i, a, i,  7 );
      case 11: set_product_x( i, a, i, 11 );
      case 13: set_product_x( i, a, i, 13 );
      case 17: set_product_x( i, a, i, 17 );
      case 19: set_product_x( i, a, i, 19 );
      case 23: set_product_x( i, a, i, 23 );
      case 29: set_product_x( i, a, i, 29 );
    }

    for(  j = i+1;  j < jn;  j++  ){
set_product_x( i, a, j,  1 );
set_product_x( i, a, j,  7 );
set_product_x( i, a, j, 11 );
set_product_x( i, a, j, 13 );
set_product_x( i, a, j, 17 );
set_product_x( i, a, j, 19 );
set_product_x( i, a, j, 23 );
set_product_x( i, a, j, 29 );
    }
}


Some comments about the upper bounds of the loops:

First, assume that 'i_limit' has been computed in conjunction with
the determination of the parameter N so that it is exactly right.

Second, the initializing expression for 'jn' is safe but may miss
part of one more value of j.  There are different ways to take care
of that, but it's a minor detail so I'm skipping over it for now.
Of course feel free to think about how that could be done if you
want to, and if not then no worries, I expect to say more about that
in a later followup.

Date Sujet#  Auteur
23 Mar 24 * Re: Sieve of Erastosthenes optimized to the max68Chris M. Thomasson
23 Mar 24 `* Re: Sieve of Erastosthenes optimized to the max67Bonita Montero
23 Mar 24  `* Re: Sieve of Erastosthenes optimized to the max66Chris M. Thomasson
24 Mar 24   `* Re: Sieve of Erastosthenes optimized to the max65Bonita Montero
24 Mar 24    `* Re: Sieve of Erastosthenes optimized to the max64Chris M. Thomasson
24 Mar 24     `* Re: Sieve of Erastosthenes optimized to the max63Bonita Montero
24 Mar 24      `* Re: Sieve of Erastosthenes optimized to the max62Chris M. Thomasson
16 May 24       `* Re: Sieve of Erastosthenes optimized to the max61Vir Campestris
16 May 24        +- Re: Sieve of Erastosthenes optimized to the max1Ben Bacarisse
22 May 24        `* Re: Sieve of Erastosthenes optimized to the max59Tim Rentsch
30 May 24         `* Re: Sieve of Erastosthenes optimized to the max58Vir Campestris
30 May 24          +- Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
30 May 24          +* Re: Sieve of Erastosthenes optimized to the max3Paavo Helde
31 May 24          i`* Re: Sieve of Erastosthenes optimized to the max2Bonita Montero
31 May 24          i `- Re: Sieve of Erastosthenes optimized to the max1Paavo Helde
31 May 24          `* Re: Sieve of Erastosthenes optimized to the max53Tim Rentsch
1 Jun 24           `* Re: Sieve of Erastosthenes optimized to the max52Vir Campestris
2 Jun 24            +- Re: Sieve of Erastosthenes optimized to the max1Richard Damon
2 Jun 24            `* Re: Sieve of Erastosthenes optimized to the max50Tim Rentsch
3 Jun 24             `* Re: Sieve of Erastosthenes optimized to the max49Tim Rentsch
18 Jun 24              `* Re: Sieve of Erastosthenes optimized to the max48Vir Campestris
19 Jun 24               `* Re: Sieve of Erastosthenes optimized to the max47Tim Rentsch
30 Jun 24                `* Re: Sieve of Erastosthenes optimized to the max46Vir Campestris
2 Jul 24                 `* Re: Sieve of Erastosthenes optimized to the max45Tim Rentsch
2 Jul 24                  +- Re: Sieve of Erastosthenes optimized to the max1Vir Campestris
3 Jul 24                  `* Re: Sieve of Erastosthenes optimized to the max43Vir Campestris
15 Jul 24                   +- Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
20 Jul 24                   +* Re: Sieve of Erastosthenes optimized to the max40Tim Rentsch
25 Jul 24                   i`* OT: Re: Sieve of Erastosthenes optimized to the max39Vir Campestris
10 Aug 24                   i +* Re: OT: Re: Sieve of Erastosthenes optimized to the max36Tim Rentsch
12 Aug 24                   i i+* Re: OT: Re: Sieve of Erastosthenes optimized to the max2Vir Campestris
16 Aug 24                   i ii`- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
15 Aug 24                   i i`* Re: OT: Re: Sieve of Erastosthenes optimized to the max33Vir Campestris
16 Aug 24                   i i `* Re: OT: Re: Sieve of Erastosthenes optimized to the max32Tim Rentsch
16 Aug 24                   i i  +* Re: OT: Re: Sieve of Erastosthenes optimized to the max30Bonita Montero
16 Aug 24                   i i  i+- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
19 Aug 24                   i i  i+* Re: OT: Re: Sieve of Erastosthenes optimized to the max12Vir Campestris
20 Aug 24                   i i  ii`* Re: OT: Re: Sieve of Erastosthenes optimized to the max11Bonita Montero
20 Aug 24                   i i  ii `* Re: OT: Re: Sieve of Erastosthenes optimized to the max10Bonita Montero
20 Aug 24                   i i  ii  +- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
20 Aug 24                   i i  ii  `* Re: OT: Re: Sieve of Erastosthenes optimized to the max8Vir Campestris
20 Aug 24                   i i  ii   +- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
26 Aug 24                   i i  ii   `* Re: OT: Re: Sieve of Erastosthenes optimized to the max6Tim Rentsch
27 Aug 24                   i i  ii    `* Re: OT: Re: Sieve of Erastosthenes optimized to the max5Bonita Montero
1 Sep 24                   i i  ii     `* Re: OT: Re: Sieve of Erastosthenes optimized to the max4Vir Campestris
2 Sep 24                   i i  ii      `* Re: OT: Re: Sieve of Erastosthenes optimized to the max3Tim Rentsch
2 Sep 24                   i i  ii       +- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
3 Sep 24                   i i  ii       `- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Vir Campestris
19 Aug 24                   i i  i+* Re: OT: Re: Sieve of Erastosthenes optimized to the max12Vir Campestris
20 Aug 24                   i i  ii+* Re: OT: Re: Sieve of Erastosthenes optimized to the max4red floyd
20 Aug 24                   i i  iii+* Re: OT: Re: Sieve of Erastosthenes optimized to the max2Vir Campestris
26 Aug 24                   i i  iiii`- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
26 Aug 24                   i i  iii`- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
20 Aug 24                   i i  ii+* Re: OT: Re: Sieve of Erastosthenes optimized to the max3Bonita Montero
20 Aug 24                   i i  iii+- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
20 Aug 24                   i i  iii`- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Bonita Montero
20 Aug 24                   i i  ii`* Re: OT: Re: Sieve of Erastosthenes optimized to the max4Bonita Montero
22 Aug 24                   i i  ii `* Re: OT: Re: Sieve of Erastosthenes optimized to the max3Vir Campestris
22 Aug 24                   i i  ii  `* Re: OT: Re: Sieve of Erastosthenes optimized to the max2Bonita Montero
22 Aug 24                   i i  ii   `- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Vir Campestris
22 Aug 24                   i i  i`* Re: OT: Re: Sieve of Erastosthenes optimized to the max4Vir Campestris
23 Aug 24                   i i  i +* Re: OT: Re: Sieve of Erastosthenes optimized to the max2red floyd
26 Aug 24                   i i  i i`- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
26 Aug 24                   i i  i `- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
19 Aug 24                   i i  `- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
11 Aug 24                   i +- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
11 Aug 24                   i `- Re: OT: Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch
23 Jul 24                   `- Re: Sieve of Erastosthenes optimized to the max1Tim Rentsch

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal