Liste des Groupes | Revenir à cu shell |
In article <vvssf0$13ls6$1@dont-email.me>,I use sed:
Nuno Silva <nunojsilva@invalid.invalid> wrote:
...My guess is that this isn't an apostrophe, but a "right single quotationCorrect, but as far as I am concerned, they are all single quotes, just
mark", which is sadly a common sight in such a context, and Emacs tells
me that this (UCS codepoint 0x2019) is represented as E2 80 99 in UTF-8.
mangled versions of same. The goal is to convert them all back into
regular single quotes. And, as you will see below, similar comments apply
for double quotes.
The AWK code that I am currently using to clean this problem contains these
lines:
gsub(/=..=..=9[CD]/,"\"")
gsub(/=..=..=../,"'")
which is good enough for me.
--Are there good ways to convert such chars to something more reasonable?As mentioned in the OP, I have never been successful in getting "iconv" to do
The only thing that occurs to me right now is passing it through iconv
to a more limited charset using transliteration (e.g. "iconv -f utf8 -t
iso8859-1//TRANSLIT -c") and then back to the desired encoding and
charset.
much of anything. No, this is not a plea for help or for man pages to be
read out loud.
(But I suppose if this is already involving perl, then perhaps such aProbably, but I'm not much into Perl. I do appreciate the solution given
modification can be done through perl too.)
here by Chuck, but don't intend on doing any real deconstruction on it.
Les messages affichés proviennent d'usenet.