On Tue, 19 Mar 2024 21:40:22 -0700
Tim Rentsch <
tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 18 Mar 2024 22:42:14 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>
[...]
>
Here is the refinement that uses a resizing rather than
fixed-size buffer.
>
>
typedef unsigned char Color;
typedef unsigned int UI;
typedef struct { UI x, y; } Point;
typedef unsigned int Index;
>
static _Bool change_it( UI w, UI h, Color [w][h], Point, Color,
Color );
>
void
fill_area( UI w, UI h, Color pixels[w][h], Point p0, Color old,
Color new ){ static const Point deltas[4] = { {1,0}, {0,1},
{-1,0}, {0,-1}, }; UI k = 0;
UI n = 17;
Point *todo = malloc( n * sizeof *todo );
>
if( todo && change_it( w, h, pixels, p0, old, new ) )
todo[k++] = p0;
>
while( k > 0 ){
Index j = n-k;
memmove( todo + j, todo, k * sizeof *todo );
k = 0;
>
while( j < n ){
Point p = todo[ j++ ];
for( Index i = 0; i < 4; i++ ){
Point q = { p.x + deltas[i].x, p.y + deltas[i].y };
if( ! change_it( w, h, pixels, q, old, new ) )
continue; todo[ k++ ] = q;
}
>
if( j-k < 3 ){
Index new_n = n+n/4;
Index new_j = new_n - (n-j);
Point *t = realloc( todo, new_n * sizeof *t );
if( !t ){ k = 0; break; }
memmove( t + new_j, t + j, (n-j) * sizeof *t );
todo = t, n = new_n, j = new_j;
}
}
}
>
free( todo );
}
>
_Bool
change_it( UI w, UI h, Color pixels[w][h], Point p, Color old,
Color new ){ if( p.x >= w || p.y >= h || pixels[p.x][p.y] !> >> old ) return 0; return pixels[p.x][p.y] = new, 1;
}
>
This variant is significantly slower than Malcolm's.
2x slower for solid rectangle, 6x slower for snake shape.
Slower with some shapes, faster in others.
In my small test suit I found no cases where this specific code is
measurably faster than code of Malcolm.
I did find one case in which they are approximately equal. I call it
"slalom shape" and it's more or less designed to be the worst case for
algorithms that are trying to speed themselves by take advantage of
straight lines.
The slalom shape is generated by following code:
static
void make_slalom(
unsigned char *image,
int width, int height,
unsigned char background_c,
unsigned char pen_c)
{
const int n_col = width/3;
const int n_row = (height-3)/4;
// top row
// P B B P P P
for (int col = 0; col < n_col; ++col) {
unsigned char c = (col & 1)==0 ? background_c : pen_c;
image[col*3] = pen_c; image[col*3+1] = c; image[col*3+2] = c;
}
// main image: consists of 3x4 blocks filled by following pattern
// P B B
// P P B
// B P B
// P P B
for (int row = 0; row < n_row; ++row) {
for (int col = 0; col < n_col; ++col) {
unsigned char* p = &image[(row*4+1)*width+col*3];
p[0] = pen_c; p[1] = background_c; p[2] = background_c;
p += width;
p[0] = pen_c; p[1] = pen_c; p[2] = background_c;
p += width;
p[0] = background_c; p[1] = pen_c; p[2] = background_c;
p += width;
p[0] = pen_c; p[1] = pen_c; p[2] = background_c;
}
}
// near-bottom rows
// P B B
for (int y = n_row*4+1; y < height-1; ++y) {
for (int col = 0; col < n_col; ++col) {
unsigned char* p = &image[y*width+col*3];
p[0] = pen_c; p[1] = background_c; p[2] = background_c;
}
}
// bottom row - all P
memset(&image[(height-1)*width], pen_c, width);
// rightmost columns
for (int x = n_col*3; x < width; ++x) {
for (int y = 0; y < height-1; ++y)
image[y*width+x] = background_c;
}
}
In any case
the code was written for clarity of presentation, with
no attention paid to low-level performance.
>
Yes, your code is easy to understand. Could have been easier still if
persistent indices had more meaningful names.
In other post I showed optimized variant of your algorithm:
- 4-neighbors loop unrolled. Majority of the speed up come not from
unrolling itself, but from specialization of in-rectangle check
enabled by unroll.
- Todo queue implemented as circular buffer.
- Initial size of queue increased.
This optimized variant is more competitive with 'line-grabby'
algorithms in filling solid shapes and faster than them in 'slalom'
case.
Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my
intuition suggested otherwise, but facts are facts.
Is it the same algorithm?
Sorry, the same algorithm as what? The same as Malcolm's?
Yes, that what I meant.
Still didn't find guts to try to understand what Malcolm's code is
doing.
Definitely not. The same as my other posting that does
not do dynamic reallocation? Yes in the sense that if the
allocated array is large enough to begin with then no
reallocations are needed.
>
Besides, I don't think that use of VLA in library code is a good
idea. VLA is optional in latest C standards. And incompatible with
C++.
The code uses a variably modified type, not a variable length
array.
I am not sufficiently versed in C Standard terminology to see a
difference.
Aren't they both introduced in C99 and made optional in later standards?
Again, the choice is for clarity of presentation. If
someone wants to get rid of the variably modified types, it's
very easy to do, literally a five minute task.
Yes, that's what it took for me.
But I knew that variably modified types exist, even if I didn't know
that they are called such.
OTOH, many (majority?) of C programmers never heard about them.
Anyway the
interface is poorly designed to start with, there are bigger
problems than just whether a variably modified type is used.
(I chose the interface I did to approximate the interface
used in Malcolm's code.)
That's true.
The biggest problem of Malcolm's interface is that logical width of the
image is the same as physical width (a.k.a. line pitch, in LAPACK
it is called the first dimension). These parameters should be separate.
If someone wants to use the functionality from C++, it's
easy enough to write a C wrapper function to do that.
IMO C++ has diverged sufficiently from C so that there
is little to be gained by trying to make code interoperable
between the two languages.
From the practical perspective, the biggest obstacle is that your code
can't be compiled with popular Microsoft compilers.