Liste des Groupes | Revenir à cl c |
On 10/06/2024 18:55, Ben Bacarisse wrote:Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:The FileSystem XML files. They are uncompressed, and as you can take in
We have a fixed Huffman tree which is part of the algorithm and optmisedIt would be more like
for ASCII. And we take each line otext, and comress it to a binary string,
using the Huffman table. The we code the binary string six bytes ar a time
using a 64 character dubset of ASCCI. And the we append a special character
which is chosen to be visually distinctive..
>
So the inout is
>
Mary had a little lamb,
it's fleece was white as snow,
and eveywhere that Mary went,
the lamb was sure to. go.
>
And we get the output.
>
CVbGNh£-H$£*MMH&-VVdsE3w2as3-vv$G^&ggf-
pOHcDdz8v3cz5Nl7WP2gno5krTqU6g/ZynQYlawju8rxyhMT6B30nDusHrWaE+TZf1KdKmJ9Fb6orB
(That's an actual example using an optimal Huffman encoding for that
input and the conventional base 64 encoding. I can post the code table,
if you like.)
And if it shorter or not depends on whether the fixed Huffman table is anyIf I use a bigger corpus of English text to derive the Huffman codes,
good.
the encoding becomes less efficient (of course) so those 110 characters
need more like 83 base 64 encoded bytes to represent them. Is 75% of
the size worth it?
What is the use-case where there is so much English text that a little
compression is worthwhile?
entire folders, they can be very large.
Les messages affichés proviennent d'usenet.