On 4/6/2024 11:41 AM, Scott Lurndal wrote:
David LaRue <huey.dll@tampabay.rr.com> writes:
fir <fir@grunge.pl> wrote in news:uurefe$8i04$1@i2pn2.org:
>
fir wrote:
<snip>
>
if someone want to talk on compression adn how to code it i could like
to read it (as reading net articles on this may be seem to hard for my
old sick head and posts are much are easier to get into it)
>
There are several good books on search and compression methods that provide
examples of complexity and discussions about the complexity and performance.
https://en.wikipedia.org/wiki/Lossless_compression
One of the earliest:
https://en.wikipedia.org/wiki/Huffman_coding
Former colleague wrote this one:
https://en.wikipedia.org/wiki/Gzip
This was once under patent to a former employer:
https://en.wikipedia.org/wiki/LZ77_and_LZ78
Also LZ4 is a moderately simple format, and works reasonably OK.
I have another (similar category but different design) format that gets slightly better compression to LZ4 at a similar decode speed.
I guess, if I were to try to quickly design another LZ format that was simple to decode, say:
Tag is a 16 bit word (may fetch more bits though);
LSB=0: Short Run
(15:8): Match Distance (if non 0)
( 7:5): Raw Length (0..7)
( 4:1): Match Length (4..19)
LSB=01: Longer Run
(31:16): Match Distance (up to 64K)
(15:11): Raw Length (0..31)
(10: 2): Match Length (4..515)
With a few special cases:
tag=0x0000: EOB
if md==0:
if(ml!=4)
rl+=(ml-3)*X; //longer run of literal bytes
So, main decoder might look something like (pseudo C):
cs=src;
ct=dst;
while(1)
{
tag=*(u32 *)cs; //assume misaligned safe and little endian
if(!(tag&1))
{
rl=(tag>>5)&7;
ml=((tag>>1)&15)+4;
md=(tag>>8)&255;
cs+=2;
if(!md)
{
if(!tag)
break; //EOB
if(ml!=4)
rl+=(ml-3)*8;
}
}else if(!(tag&2))
{
rl=(tag>>11)&31;
ml=((tag>>2)&511)+4;
md=(tag>>16)&65535;
cs+=4;
if(!md)
{
if(ml!=4)
rl+=(ml-3)*32;
}
}else
{
//maybe more cases
}
if(rl)
{
memcpy(ct, cs, rl);
cs+=rl; ct+=rl;
}
if(md)
{
matchcopy(ct, ct-md, ml);
ct+=ml;
}
}
The matchcopy function would resemble memcpy, but have different semantics in the case of self-overlap.
Say, a simple version (not tuned for performance):
void matchcopy(byte *dst, byte *src, int len)
{
if((src+len)<dst)
{
//no overlap
memcpy(dst, src, len);
return;
}
while(len--)
*dst++=*src++;
}
The match copy operation tends to bigger and more complicated if one tries to make it faster (where generally byte-for-byte copies are rather slow, and the case of long repeating RLE like runs isn't particularly rare).
Designing an LZ encoder is more of a challenge though.
Where typically, speed, compression, and code simplicity, are all mutually opposed (a simple LZ encoder will be either slow or give crappy compression, though speed is a matter of perspective).
Note that comparably, Deflate is a fairly complicated format.
Huffman coding is effective, but as noted, comes at a cost in terms of both speed and code complexity (but is still much faster than something like range coding).
Though, partial ways to make Huffman decoding a little faster (in the design of a compressor):
Limit maximum symbol length to around 12 or 13 bits, which allows lookup tables to fit in the L1 caches of typical CPUs;
Decode blocks of symbols in advance, then fetch symbols as-needed from these blocks (this allows overlapping the Huffman symbol decoding, making more effective use of the CPU pipeline).
These designs had sort of ended up resembling an entropy-coded version of LZ4 (usually with Huffman coded blocks for tags, literal bytes, and length/distance prefixes; using a scheme similar to Deflate's distance coding scheme for longer match or raw lengths, as well as for match distance). Note that after decoding the symbol blocks, the bitstream would only contain "extra bits" for the distance coding.
But that said, on a modern PC style CPU, it is rather difficult to get Huffman coded designs much over about 600 or 700 MB/sec for decode speed, even with these tricks (if one skips an entropy-coding stage, typically several GB/sec is possible for LZ decoding).
Though, one possibility could be a pseudo-entropy scheme, say:
Sort all the symbols into descending frequency order;
Stored directly as a table of 128 bytes (skip uncommon bytes).
Encode the symbol blocks with bytes encoding table indices, say:
00..78: Symbol Pair (0..10)
7F: Escaped Symbol (encoded as a byte)
80..FF: Direct Index (0..127)
With the block being encoded as a blob of raw bytes if the pseudo-entropy scheme didn't save enough space (say, if there isn't a significant probability skew). Idea being that this would be faster to decode than blobs of Huffman-coded symbols.
Then, say, block is encoded as a 16-bit prefix, say:
0000..3FFF: Up to 16K, raw byte symbols
4000..7FFF: Up to 16K, pseudo-entropy.
8000..FFFF: Escape other block types.
Though, efficiently dealing with the dual-symbol case would be harder, most obvious answers being one of:
Lookup index pair in a lookup table;
itmp=indexpairlut[relidx];
i0=itmp&255;
i1=itmp>>8;
Or, try to split it up directly.
Naive:
i1=relidx/11; //can be turned into multiply by reciprocal
i1=(relidx*0x175)>>12; //alternate
i0=relidx-(i1*11);
Though, unclear if this would be fast enough to be worthwhile.
Could maybe also abandon the use of a bitstream for the extra-bits, to using nybbles or bytes instead (where a nybble or byte stream is faster than a bitstream). Decided not to go into the specifics of length/distance coding here.
But, say, in this case, each LZ coded block would consist of 4 symbol blocks:
Tag Symbols
Literal Symbols
Distance Symbols
Distance Extra (probably always raw bytes)
TBD if it could be performance competitive with the purely byte-oriented alternatives though, which would still have the advantage of being simpler in any case.
Well, among any number of possibilities.
Could probably fiddle with it...
But, all this is getting a bit more advanced...