Re: ASCII to ASCII compression.

Liste des GroupesRevenir à l c 
Sujet : Re: ASCII to ASCII compression.
De : nospam (at) *nospam* needed.invalid (Paul)
Groupes : comp.lang.c
Date : 07. Jun 2024, 17:22:12
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v3v8j5$249sg$1@dont-email.me>
References : 1 2 3 4
User-Agent : Ratcatcher/2.0.0.25 (Windows/20130802)
On 6/6/2024 5:49 PM, bart wrote:
On 06/06/2024 22:26, Malcolm McLean wrote:
On 06/06/2024 20:23, Paul wrote:
On 6/6/2024 12:25 PM, Malcolm McLean wrote:
>
Not strictly a C programming question, but smart people will see the relavance to the topicality, which is portability.
>
Is there a compresiion algorthim which converts human language ASCII text to compressed ASCII, preferably only "isgraph" characters?
>
So "Mary had a little lamb, its fleece was white as snow".
>
Would become
>
QWE£$543GtT£$"||x|VVBB?
>
>
The purpose of doing this, is to satisfy transmission through a 7 bit channel.
In the history of networking, not all channels were eight-bit transparent.
(On the equipment in question, this was called "robbed-bit signaling.)
For example, BASE64 is valued for its 7 bit channel properties, the ability
to pass through a pipe which is not 8 bit transparent. Even to this day,
your email attachments may traverse the network in BASE64 format.
>
That is one reason, that email or USENET clients to this day, have
both 7 bit and 8 bit content encoding methods. It's to handle the
unlikely possibility that 7 bit transmission channels still exist.
They likely do exist.
>
Yes. If yiu stire data as 8 but binaries then it's inherently risky. There's usually no recovery froma single bit gett corrupted.
>
Whilst if you store as ASCII, the data can usually be recovered very easly if something goes wrong wit the phsyical storage. A "And God said"
becomes "And G$d said", an even with this tiny text, you can still read
it perfectly well.
 
But you are suggesting storing the compression data as meaningless ASCII such as:
 
QWE£$543GtT£$"||x|VVBB?
 
If one bit gets flipped, then it will just be slightly different meaningless ASCII; there's no way to detect it except checksums, CRCs and the like.
 
In any case, the error detection won't be done by a human, but machine.
 
Possibly a human might detect, when back in plain text that 'Mary hid a little lamb' should have been 'had', but now this is getting silly, needing to rely on knowledge of nursery rhymes.
 
Trillions of bytes binary data must be transmitted every day (perhaps every minute; I've no idea); how often have you encountered a transmission error?
 
Compression schemes tend to have error-detection built-in; I'm sure comms do as well, as well as storage device controllers and drivers. People have this sort of thing in hand already!
 
 

ZIP (of WinZIP fame), has a CRC computed per file. The decompression
step, will tell you if a file is corrupted. The column of CRC values
is shown in some of the unpacking software (and if you run a CRC check
separately on the file at a later date, you can compare).

    [Picture[

     https://i.postimg.cc/DwQgPQP3/ZIP-CRC-field.gif

True repair capability, requires a better code. The Reed Solomon David Brown
mentions is an example of such a code. A three dimensional version on CDs,
makes the CD very resistant to errors. By the time the Reed Solomon cannot
repair a CD, the CD surface is so bad, the laser can no longer lock to the groove.
Rather than Reed Solomon complaining it cannot correct the data, instead
it is the optical drive reporting it cannot find the groove using the laser.

Storage media also has repair capability. A typical SSD (NAND flash storage device),
has 10% overhead for corrections. A 512 byte sector, has an extra 51 bytes set
aside for error correction. When your SSD slows down to 300MB/sec from 530MB/sec,
that means that every sector being read had errors, and is being corrected by a
processor inside the SSD drive. This is a "normal" state of affairs for TLC
or QLC based drives. Some 2.5" flash devices, have a three core ARM processor,
and at least one of the cores does error correction.

But on an archival format with extreme compression, finding that "someone had
wasted an extra 10% on error correction capability", that would of course
annoy a user expecting the extreme compression to save them money (for storage).

When selecting a "scheme", you have to decide what kind of error-type you
are protecting against.

For example, on hard drives, someone postulated they were protecting against
single-bit (independent, does not correlate with other single-bits) errors.
The Fire codes (polynomial) were the result. There is some small probability
of multiple bits (perhaps an error multiplication effect in the DSP-based
data recovery on read). At the time, no one considered that a heavy-weight method
was necessary.

When you expect to be losing whole sectors, whole files, whole pieces of media,
there are PAR codes for that. But these were determined to be not mathematically
sound, so serious archival use might not use them. The idea would be, if an
archive spanned ten CDs, you would burn one or two more CDs (generated by PAR),
and if any of the twelve CDs total was bad, PAR could regenerate the
missing information (if any). Of the 12 CDs, any two could go missing, and they
could then be regenerated.

A simpler to understand scheme, is to burn duplicate CD copies of the same information.
If you lose a CD, or if the media surface degrades completely, you have the
second CD. And that does not involve any complex PAR method :-) It's easier
for the human to understand.

   Paul


Date Sujet#  Auteur
6 Jun 24 * ASCII to ASCII compression.42Malcolm McLean
6 Jun 24 +* Re: ASCII to ASCII compression.12bart
6 Jun 24 i+* Re: ASCII to ASCII compression.3Michael S
17 Jun 24 ii`* Re: ASCII to ASCII compression.2Lawrence D'Oliveiro
17 Jun 24 ii `- Re: ASCII to ASCII compression.1Michael S
6 Jun 24 i`* Re: ASCII to ASCII compression.8Malcolm McLean
6 Jun 24 i +- Re: ASCII to ASCII compression.1Keith Thompson
7 Jun 24 i +- Re: ASCII to ASCII compression.1Mikko
7 Jun 24 i `* Re: ASCII to ASCII compression.5David Brown
7 Jun 24 i  `* Re: ASCII to ASCII compression.4Malcolm McLean
7 Jun 24 i   +- Re: ASCII to ASCII compression.1David Brown
7 Jun 24 i   `* Re: ASCII to ASCII compression.2Paul
10 Jun 24 i    `- Re: ASCII to ASCII compression.1BGB-Alt
6 Jun 24 +* Re: ASCII to ASCII compression.10Ben Bacarisse
6 Jun 24 i`* Re: ASCII to ASCII compression.9Malcolm McLean
7 Jun 24 i `* Re: ASCII to ASCII compression.8Mikko
7 Jun 24 i  `* Re: ASCII to ASCII compression.7Malcolm McLean
7 Jun 24 i   +* Re: ASCII to ASCII compression.5Mikko
7 Jun 24 i   i+- Re: ASCII to ASCII compression.1BGB
7 Jun 24 i   i`* Re: ASCII to ASCII compression.3Malcolm McLean
7 Jun 24 i   i `* Re: ASCII to ASCII compression.2Richard Harnden
8 Jun 24 i   i  `- Re: ASCII to ASCII compression.1Malcolm McLean
7 Jun 24 i   `- Re: ASCII to ASCII compression.1Chris M. Thomasson
6 Jun 24 +- Re: ASCII to ASCII compression.1Kaz Kylheku
6 Jun 24 +* Re: ASCII to ASCII compression.7Paul
6 Jun 24 i`* Re: ASCII to ASCII compression.6Malcolm McLean
6 Jun 24 i +* Re: ASCII to ASCII compression.2bart
7 Jun 24 i i`- Re: ASCII to ASCII compression.1Paul
10 Jun 24 i `* Re: ASCII to ASCII compression.3Lowell Gilbert
10 Jun 24 i  `* Re: ASCII to ASCII compression.2Malcolm McLean
10 Jun 24 i   `- Re: ASCII to ASCII compression.1bart
7 Jun 24 +* Re: ASCII to ASCII compression.4Mikko
7 Jun 24 i`* Re: ASCII to ASCII compression.3Malcolm McLean
9 Jun 24 i `* Re: ASCII to ASCII compression.2Michael S
9 Jun 24 i  `- Re: ASCII to ASCII compression.1Malcolm McLean
10 Jun 24 `* Re: ASCII to ASCII compression.7Lew Pitcher
10 Jun 24  `* Re: ASCII to ASCII compression.6Malcolm McLean
10 Jun 24   +- Re: ASCII to ASCII compression.1Michael S
10 Jun 24   `* Re: ASCII to ASCII compression.4Ben Bacarisse
10 Jun 24    `* Re: ASCII to ASCII compression.3Malcolm McLean
10 Jun 24     `* Re: ASCII to ASCII compression.2Ben Bacarisse
10 Jun 24      `- Re: ASCII to ASCII compression.1Malcolm McLean

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal