Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue

Liste des GroupesRevenir à cu shell 
Sujet : Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue
De : Lem (at) *nospam* none.invalid (Lem Novantotto)
Groupes : comp.unix.shell
Date : 20. Feb 2025, 12:14:42
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vp72r2$2pift$1@dont-email.me>
References : 1
User-Agent : Pan/0.160 (Toresk; )
Il Wed, 19 Feb 2025 12:27:18 +0100, Janis Papanagnou ha scritto:

I've been sorting punctuation characters on one Unix system and it did
not produce the expected result. Switching to another system did it as
expected.

The second system (not working "properly") is treating all dots as equal,
so it sorts just the letters.

Also my system doesn't sort properly. In my system:

$ locale
LANG=it_IT.UTF-8
LANGUAGE=it_IT
LC_CTYPE="it_IT.UTF-8"
LC_NUMERIC="it_IT.UTF-8"
LC_TIME="it_IT.UTF-8"
LC_COLLATE="it_IT.UTF-8"
LC_MONETARY="it_IT.UTF-8"
LC_MESSAGES="it_IT.UTF-8"
LC_PAPER="it_IT.UTF-8"
LC_NAME="it_IT.UTF-8"
LC_ADDRESS="it_IT.UTF-8"
LC_TELEPHONE="it_IT.UTF-8"
LC_MEASUREMENT="it_IT.UTF-8"
LC_IDENTIFICATION="it_IT.UTF-8"
LC_ALL=

Let's see. In my /usr/share/i18n/locales/it_IT, I have yhis section:

LC_COLLATE
copy "iso14651_t1"
END LC_COLLATE

In your second system, you have LC_COLLATE=en_US or de_DE. It's the same:
in the relative files there is always the same section:
LC_COLLATE
copy "iso14651_t1"
END LC_COLLATE

But in /usr/share/i18n/locales/C there is:

LC_COLLATE
% The keyword 'codepoint_collation' in any part of any LC_COLLATE
% immediately discards all collation information and causes the
% locale to use strcmp/wcscmp for collation comparison.  This is
% exactly what is needed for C (ASCII) or C.UTF-8.
codepoint_collation
END LC_COLLATE

And here it is:

$ LC_COLLATE=C sort yada yada

gives the correct sorting.
--
Bye, Lem
Talis erit dies qualem egeris

Date Sujet#  Auteur
19 Feb 25 * Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue9Janis Papanagnou
19 Feb 25 +* Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue4Christian Weisgerber
20 Feb 25 i+* Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue2Janis Papanagnou
20 Feb 25 ii`- Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue1Lawrence D'Oliveiro
20 Feb 25 i`- Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue1Janis Papanagnou
19 Feb 25 +* Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue3Dan Cross
20 Feb 25 i+- Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue1Janis Papanagnou
20 Feb 25 i`- Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue1Christian Weisgerber
20 Feb 25 `- Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue1Lem Novantotto

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal