Sujet : Re: filling area by color atack safety
De : already5chosen (at) *nospam* yahoo.com (Michael S)
Groupes : comp.lang.cDate : 30. Mar 2024, 19:15:06
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <20240330211506.00000b86@yahoo.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13
User-Agent : Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
On Fri, 29 Mar 2024 23:58:26 -0700
Tim Rentsch <
tr.17687@z991.linuxsc.com> wrote:
I did program in FORTRAN briefly but don't remember ever using
computed GO TO. And yes, I found that missing semicolon and put it
back. Is there some reason you don't always use -pedantic? I
pretty much always do.
Just a habit.
In "real" work, as opposed to hobby, I use gcc almost exclusively for
small embedded targets and quite often with 3-rd party libraries in
source form. In such environment rising warnings level above -Wall
would be counterproductive, because it would be hard to see relevant
warning behind walls of false alarms.
May be, for hobby, where I have full control on everything, switching
to -Wpedantic is not a bad idea.
An alternate idea is to use a 64-bit integer for 32 "top of stack"
elements, or up to 32 I should say, and a stack with 64-bit values.
Just an idea, it may not turn out to be useful.
That's just a detail of how to do pack/unpack with minimal
overhead. It does not change the principle that 'packed' version would
be less memory hungry but on modern PC with GBs of RAM it will not be
faster than original.
Memory footprint can directly affect speed when access patterns have
poor locality or when the rate of access exceeds 10-20 GB/s. In our
case locality of stack access is very good and the rate of stack
access, even on ultra fast processor, is less than 1 GB/s.
The few measurements I have done don't show a big difference in
performance between the two methods. But I admit I wasn't paying
close attention, and like I said only a few patterns of filling were
exercised.
After implementing the first enhancement I paid attention that at
4K size the timing (per pixel) for few of my test cases is
significantly worse than at smaller images. So, I added another
enhancement aiming to minimize cache trashing effects by never
looking back at immediate parent of current block. The info about
location of the parent nicely fitted into remaining 2 bits of
stack octet.
The idea of not going back to the originator (what you call the
parent) is something I developed independently before looking at
your latest code (and mostly I still haven't). Seems like a good
idea.
>
I call it a principle of Lot's wife.
That is yet another reason to not grow blocks above 2x2.
For bigger blocks it does not apply.
Two further comments.
One, the new code is a lot more complicated than the previous
code. I'm not sure the performance gain is worth the cost
in complexity. What kind of speed improvements do you see,
in terms of percent?
>
On my 11 y.o. and not top-of-the-line even then home PC for 4K
image (3840 x 2160) with cross-in-cross shape that I took from one of
your previous post, it is 2.43 times faster.
I don't remember how it compares on more modern systems. Anyway, right
now I have no test systems more modern than 3 y.o. Zen3.
Two, and more important, the new algorithm still uses O(NxN) memory
for an N by N pixel field. We really would like to get that down to
O(N) memory (and of course run just as fast). Have you looked into
how that might be done?
Using this particular principle of not saving (x,y) in auxiliary
storage, I don't believe that it is possible to have a footprint
smaller than O(W*H).