Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue

Liste des GroupesRevenir à cu shell 
Sujet : Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue
De : janis_papanagnou+ng (at) *nospam* hotmail.com (Janis Papanagnou)
Groupes : comp.unix.shell
Date : 19. Feb 2025, 12:27:18
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vp4f6o$288ui$1@dont-email.me>
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
I've been sorting punctuation characters on one Unix system and it
did not produce the expected result. Switching to another system did
it as expected.

The test program (it contains non-ASCII middle-dot characters) was

sort -t $'\t' <<EOT
····**·······**················< abc1
···········**······**··········< efg2
·**·························**·< hij3
············**·················< klm4
···**····················**····< nop5
···**···················**·**··< qrs6
··**··········**·········**····< tuv7
**·····························< wxy8
EOT


Run on an older system - with sort (GNU coreutils) 8.13 - produced

**·····························< wxy8
·**·························**·< hij3
··**··········**·········**····< tuv7
···**···················**·**··< qrs6
···**····················**····< nop5
····**·······**················< abc1
···········**······**··········< efg2
············**·················< klm4


On a newer system - with sort (GNU coreutils) 8.28 - it produced no
sorting at all (of these lines[*]).

····**·······**················< abc1
···········**······**··········< efg2
·**·························**·< hij3
············**·················< klm4
···**····················**····< nop5
···**···················**·**··< qrs6
··**··········**·········**····< tuv7
**·····························< wxy8


One hypothesis was that it's some locale issue. So I've copied the
LC_* settings to the newer system and disabled them one by one.
Strangely, the one that was responsible for the effect was LC_TIME!

On the correct sorting system it was defined as
  LC_TIME=de_DE.UTF-8@isodate
and the one that worked improperly had
  LC_TIME=de_DE.UTF-8

Now I'm puzzled in many ways...
If anything, I'd expected LC_COLLATE to have an effect on sorting.
Then there's no locale with @isodate on that sort-defunct system.
And clearing that LC_TIME locale or removing the "@isodate" part
did not change anything; it needs that setting to a non-existing
locale file to work correctly on the otherwise not correctly
sorting system.

Does anyone have an idea what's going on here?

I'm reluctant to globally set  LC_TIME=de_DE.UTF-8@isodate
(since there is no file with that name in the locale directories).

Thanks.

Janis

[*] Lines with additional other contents than the depicted payload
were sorted correctly.

Date Sujet#  Auteur
19 Feb 25 * Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue9Janis Papanagnou
19 Feb 25 +* Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue4Christian Weisgerber
20 Feb 25 i+* Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue2Janis Papanagnou
20 Feb 25 ii`- Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue1Lawrence D'Oliveiro
20 Feb 25 i`- Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue1Janis Papanagnou
19 Feb 25 +* Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue3Dan Cross
20 Feb 25 i+- Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue1Janis Papanagnou
20 Feb 25 i`- Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue1Christian Weisgerber
20 Feb 25 `- Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue1Lem Novantotto

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal