On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <
malcolm.arthur.mclean@gmail.com> wrote:
On 19/03/2024 11:18, Michael S wrote:
On Mon, 18 Mar 2024 22:42:14 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>
[...]
>
Here is the refinement that uses a resizing rather than
fixed-size buffer.
>
>
typedef unsigned char Color;
typedef unsigned int UI;
typedef struct { UI x, y; } Point;
typedef unsigned int Index;
>
static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
Color );
>
void
fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old,
Color new ){ static const Point deltas[4] = { {1,0}, {0,1},
{-1,0}, {0,-1}, }; UI k = 0;
UI n = 17;
Point *todo = malloc( n * sizeof *todo );
>
if( todo && change_it( w, h, pixels, p0, old, new ) )
todo[k++] = p0;
>
while( k > 0 ){
Index j = n-k;
memmove( todo + j, todo, k * sizeof *todo );
k = 0;
>
while( j < n ){
Point p = todo[ j++ ];
for( Index i = 0; i < 4; i++ ){
Point q = { p.x + deltas[i].x, p.y + deltas[i].y
}; if( ! change_it( w, h, pixels, q, old, new ) )
continue; todo[ k++ ] = q;
}
>
if( j-k < 3 ){
Index new_n = n+n/4;
Index new_j = new_n - (n-j);
Point *t = realloc( todo, new_n * sizeof *t );
if( !t ){ k = 0; break; }
memmove( t + new_j, t + j, (n-j) * sizeof *t );
todo = t, n = new_n, j = new_j;
}
}
}
>
free( todo );
}
>
_Bool
change_it( UI w, UI h, Color pixels[w][h], Point p, Color old,
Color new ){ if( p.x >= w || p.y >= h || pixels[p.x][p.y] !=
old ) return 0; return pixels[p.x][p.y] = new, 1;
}
This variant is significantly slower than Malcolm's.
2x slower for solid rectangle, 6x slower for snake shape.
Is it the same algorithm?
No. Mine takes horizontal scan lines and extends them, then places
the pixels above and below in a queue to be considered as seeds for
the next scan line. (It's not mine, but I don't know who invented it.
It wasn't me.)
Tim, now what does it do? Essentially it's the recursive fill
algorithm but with the data only on the stack instead of the call and
the data. And todo is actually a queue rather than a stack.
Now why would it be slower? Probaby because you usually only hit a
pixel three times with mine - once below, once above, and once for
the scan line itself, whilst you consider it 5 times for Tim's - once
for each neighbour and once for itself. Then horizontally adjacent
pixels are more likely to be in the same cache line than vertically
adjacent pixels, so processing images in scan lines tends to be a bit
faster.
I did a little more investigation gradually modifying Tim's code for
improved performance without changing the basic principle of the
algorithm. Yes, micro-optimization. Yes, I said earlier that doing so
in c.l.c it is bad sportsmanship. So what? I never claimed to be an
ideal sportsman.
The point is that after optimizations it's actually faster than the
best implementations of original recursive algorithm, including
implementation that uses explicit stack and is quite economical in its
memory consumption. Tim's algorithm is 8 times less economical (8 bytes
per saved node vs 1 byte in explicit stack) and nevertheless almost
twice faster for both shapes that I was testing.
So far, this algorithm is fastest among all "local" algorithms that I
tried. By "local" I mean algorithms that don't try to recolor more than
one pixel at time.
"Non-local" algorithms i.e. yours and my recursive algorithm that
recolors St. George cross, are somewhat faster, but I suspect that
it's because all shapes that I use for testing have either long
columns or long rows or both.
The nice thing about Tim's method is that we can expect that
performance depends on number of recolored pixels and almost nothing
else.
The second nice thing is that it is easy to understand. Not as easy as
original recursive method, but easier than the rest of them.
If you or somebody else is interested, here is [micro]optimized variant:
#include <stdlib.h>
#include <stddef.h>
#include <string.h>
typedef unsigned char Color;
typedef int UI;
typedef struct { UI x, y; } Point;
static inline
Point* circularIncr(Point* p, Point* beg, Point* end) {
return p + 1 == end ? beg : p + 1;
}
static inline
Point mk_point(int x, int y) {
Point pt={x,y};
return pt;
}
int floodfill_r(
Color *pixels,
int w, int h,
int pt0_x, int pt0_y,
Color old, Color new)
{
if (pt0_x < 0 || pt0_x >= w || pt0_y < 0 || pt0_y >= h)
return 0;
if (pixels[pt0_y*w+pt0_x] != old)
return 0;
pixels[pt0_y*w+pt0_x] = new;
const ptrdiff_t INITIAL_TODO_SIZE = 125;
Point *todo = malloc( (INITIAL_TODO_SIZE+3) * sizeof *todo );
// +3 is extra size to assist wrap-around of wr
if (!todo)
return -1;
Point* todo_end = &todo[INITIAL_TODO_SIZE];
todo[0] = mk_point(pt0_x, pt0_y);
Point* wr = &todo[1];
Point* rd = todo;
ptrdiff_t free_space = INITIAL_TODO_SIZE - 1;
do {
Point pt = *rd;
rd = circularIncr(rd, todo, todo_end);
Point* prev_wr = wr;
if (pt.x > 0 && pixels[pt.y*w+pt.x-1] == old) {
pixels[pt.y*w+pt.x-1] = new;
*wr++ = mk_point(pt.x-1, pt.y);
}
if (pt.y > 0 && pixels[pt.y*w+pt.x-w] == old) {
pixels[pt.y*w+pt.x-w] = new;
*wr++ = mk_point(pt.x, pt.y-1);
}
if (pt.x+1 < w && pixels[pt.y*w+pt.x+1] == old) {
pixels[pt.y*w+pt.x+1] = new;
*wr++ = mk_point(pt.x+1, pt.y);
}
if (pt.y+1 < h && pixels[pt.y*w+pt.x+w] == old) {
pixels[pt.y*w+pt.x+w] = new;
*wr++ = mk_point(pt.x, pt.y+1);
}
free_space += 1 - (wr - prev_wr);
if (wr >= todo_end) {
memcpy(todo, todo_end, (wr - todo_end)*sizeof(*wr));
wr += todo - todo_end;
}
if (free_space < 4) {
ptrdiff_t rdi = rd-todo;
ptrdiff_t wri = wr-todo;
ptrdiff_t sz = todo_end - todo;
ptrdiff_t incr = sz/4;
Point* new_todo = realloc(todo, (sz+incr+3) * sizeof *todo );
// +3 is extra size to assist wrap-around of wr
if(!new_todo) {
free(todo);
return -1;
}
free_space += incr;
rd = &new_todo[rdi];
wr = &new_todo[wri];
todo = new_todo;
todo_end = &todo[sz+incr];
if (rd >= wr) {
memmove(&rd[incr], rd, (sz-rdi) * sizeof *todo );
rd = &rd[incr];
}
}
} while (rd != wr);
free( todo );
return 1;
}