Sujet : Re: OT: Re: A very slow program
De : vir.campestris (at) *nospam* invalid.invalid (Vir Campestris)
Groupes : alt.comp.lang.c comp.lang.cDate : 26. Oct 2024, 16:28:24
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vfiua8$3obkb$2@dont-email.me>
References : 1 2 3 4 5 6
User-Agent : Mozilla Thunderbird
On 21/10/2024 16:36, Bonita Montero wrote:
Am 20.10.2024 um 12:49 schrieb Vir Campestris:
On 04/10/2024 05:53, Bonita Montero wrote:
Am 03.10.2024 um 18:06 schrieb Vir Campestris:
>
for 4e9 primes my machine take nearly 10 seconds for 4294967295
primes. The one I posted before over there takes 1.5 seconds.
>
Show me your code. I'm pretty sure you never posted faster code.
>
>
Try this.
>
Still about half the speed of my code for 2 ^ 32.
Fascinating.
But after my latest experience perhaps not surprising.
You're on Windows aren't you? I do know you have a far more powerful machine than I do. I'm running Ubuntu with Gnu compiler. (I tried Clang, but it's slower)
When I try your code on my system it takes 10.4s eleapsed, and 12.1s of CPU to calculate 4294967296. That code I supplied up there takes only 1.47s elapsed, and a little less CPU - it's single threaded.
So on _my_ system my code is about 7 times faster!
But my recent experience - I've been playing with using a modulo30 store, not just odds.
My code makes bitmasks for 2 sets of small primes (these can be small, there's a repeat at the product of all the primes) then combines the 2 to make the initial mask for the desired number of primes. It then masks out the bigger primes.
With a mask of odds-only the mask words can be 64 bit, and combining them is way faster than Mod30 - where the mask is a byte. So I put together some code so that it would put the two small masks together with 64 bit operations. That bit of the code got twice as fast. Then I realised that the next part, for reasons I have been unable to establish, runs three times more slowly.
But only when I'm using GCC (not Clang) and only when -Ofast optimisation is turned on. That loop is the same speed in all 0other cases.
I then tried putting some debug timing statements in, and I could get the same effect by inserting some in in various places.
I looked at the output assembly, and there seem to be random changes (instructions that are not dependent on each other being re-ordered) so it's not obvious why this is happening.
I think I'm bored enough with this project to just leave it. I've been more interested in correctness, not speed, in recent years.
Andy