Sujet : Re: [gawk] Handling variants of CSV input data formats
De : janis_papanagnou+ng (at) *nospam* hotmail.com (Janis Papanagnou)
Groupes : comp.lang.awkDate : 27. Aug 2024, 02:39:07
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vajant$2m8em$1@dont-email.me>
References : 1 2 3 4
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
On 27.08.2024 02:49, Ed Morton wrote:
On 8/26/2024 7:54 AM, Janis Papanagnou wrote:
snip>
I'd have liked to provide more concrete information here, but I'm at
the moment even unable to reproduce Awk's behavior as documented in
its manual; I've tried the following command with various locales
>
$ echo 4,321 | LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }'
-| 5,321
>
but always got just 5 as result.
You need to specifically TELL gawk to use your locale to read input
numbers:
$ echo 4,321 | LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }'
5
$ echo 4,321 | POSIXLY_CORRECT=1 LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }'
5,321
$ echo 4,321 | LC_ALL=en_DK.utf-8 gawk -N '{ print $1 + 1 }' 5,321
See
https://www.gnu.org/software/gawk/manual/gawk.html#Locale-influences-conversions
for more info on that.
Thanks. That's actually where I got above example from.
I've missed that there was an explicit
$ export POSIXLY_CORRECT=1
set on the very top of these examples. Gee!
Feels anyway strange that an explicit LC_* setting is ineffective
without the additional POSIXLY_CORRECT variable. And the page also
says: "The POSIX standard says that awk always uses the period as
the decimal point when reading the awk program source code".
So despite POSIX saying that, you have to use a variable named
POSIXLY_CORRECT. - Do I need some more coffee to understand that?
And I see there's an additional GNU Awk option '--use-lc-numeric'.
What a mess!
(I suppose current status can only be explained by the mentioned
forth-and-back during history of various GNU Awk versions.)
What's worth the LC_* variables if they are ignored (or maybe not).
Janis
Regards,
Ed