Sujet : Re: ASCII to ASCII compression.
De : ben (at) *nospam* bsb.me.uk (Ben Bacarisse)
Groupes : comp.lang.cDate : 10. Jun 2024, 18:55:34
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <87tti03co9.fsf@bsb.me.uk>
References : 1 2 3
User-Agent : Gnus/5.13 (Gnus v5.13)
Malcolm McLean <
malcolm.arthur.mclean@gmail.com> writes:
We have a fixed Huffman tree which is part of the algorithm and optmised
for ASCII. And we take each line otext, and comress it to a binary string,
using the Huffman table. The we code the binary string six bytes ar a time
using a 64 character dubset of ASCCI. And the we append a special character
which is chosen to be visually distinctive..
>
So the inout is
>
Mary had a little lamb,
it's fleece was white as snow,
and eveywhere that Mary went,
the lamb was sure to. go.
>
And we get the output.
>
CVbGNh£-H$£*MMH&-VVdsE3w2as3-vv$G^&ggf-
It would be more like
pOHcDdz8v3cz5Nl7WP2gno5krTqU6g/ZynQYlawju8rxyhMT6B30nDusHrWaE+TZf1KdKmJ9Fb6orB
(That's an actual example using an optimal Huffman encoding for that
input and the conventional base 64 encoding. I can post the code table,
if you like.)
And if it shorter or not depends on whether the fixed Huffman table is any
good.
If I use a bigger corpus of English text to derive the Huffman codes,
the encoding becomes less efficient (of course) so those 110 characters
need more like 83 base 64 encoded bytes to represent them. Is 75% of
the size worth it?
What is the use-case where there is so much English text that a little
compression is worthwhile?
-- Ben.