Sujet : Re: How to convert <binaryGlowMixedWithASCII> to pure ASCII
De : gazelle (at) *nospam* shell.xmission.com (Kenny McCormack)
Groupes : comp.unix.shellDate : 12. May 2025, 15:03:32
Autres entêtes
Organisation : The official candy of the new Millennium
Message-ID : <vvsv3k$33vk8$1@news.xmission.com>
References : 1 2 3 4
User-Agent : trn 4.0-test77 (Sep 1, 2010)
In article <
vvssf0$13ls6$1@dont-email.me>,
Nuno Silva <
nunojsilva@invalid.invalid> wrote:
...
My guess is that this isn't an apostrophe, but a "right single quotation
mark", which is sadly a common sight in such a context, and Emacs tells
me that this (UCS codepoint 0x2019) is represented as E2 80 99 in UTF-8.
Correct, but as far as I am concerned, they are all single quotes, just
mangled versions of same. The goal is to convert them all back into
regular single quotes. And, as you will see below, similar comments apply
for double quotes.
The AWK code that I am currently using to clean this problem contains these
lines:
gsub(/=..=..=9[CD]/,"\"")
gsub(/=..=..=../,"'")
which is good enough for me.
Are there good ways to convert such chars to something more reasonable?
The only thing that occurs to me right now is passing it through iconv
to a more limited charset using transliteration (e.g. "iconv -f utf8 -t
iso8859-1//TRANSLIT -c") and then back to the desired encoding and
charset.
As mentioned in the OP, I have never been successful in getting "iconv" to do
much of anything. No, this is not a plea for help or for man pages to be
read out loud.
(But I suppose if this is already involving perl, then perhaps such a
modification can be done through perl too.)
Probably, but I'm not much into Perl. I do appreciate the solution given
here by Chuck, but don't intend on doing any real deconstruction on it.
-- "I have a simple philosophy. Fill what's empty. Empty what's full. Andscratch where it itches." Alice Roosevelt Longworth