Sujet : Re: Unicode in strings
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 14. May 2024, 18:43:43
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <6124140226e28fd4afec0b435bdbeca1@www.novabbs.org>
References : 1 2 3 4 5 6 7 8 9 10 11
User-Agent : Rocksolid Light
Anton Ertl wrote:
Thomas Koenig <tkoenig@netcologne.de> writes:
E.g., consider the following Gforth code (others can tell you how to
do it in Python):
"Ko\u0308nig" cr type
The output is:
König
That is, the second character consists of two Unicode code points, the
"o" and the "\u0308" (Combining Diaeresis).
(I think that somewhere along the way from the Forth system to the
xterm through copying and pasting into Emacs the second character has
become precomposed, but that's probably just as well, so you can see
what I see).
If I replace the third code point with an e, I get "Koenig". So by
overwriting one code point, I insert a character into the string.
If instead I replace the second code point with a "\u0316" (Combining
Grave Accent Below):
"K\u0316\u0308nig" cr type
I get this (which looks as expected in my xterm, but not in Emacs)
K̖̈nig
The first character is now a K with a diaresis above and an accent
grave below and there are now a total of 4 characters, but still 6
code points in the string; the second character has been deleted by
this code-point replacement.
It seems to me (in my vast ignorance) that names for things should be
written in the most appropriate set of characters in the language of
the person/thing being named.
Then when such a name is "sent out to be displayed" that it is a property
of the display what character set(s) it can properly emit, and thereby
alter the string of characters as appropriate to its capabilities.
For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig
When displayed on a ASCII only line printer it would be written Koenig
When displayed on a enhanced ASCII printer it would be written König
When displayed on a full functional printer it would be written K̖̈nig
The problem is the mapping function between how it should be encoded
in its own native language to what can be expressed on a particular device.
Only the display device needs to understand this mapping and NOT the program/software/device holding the string.
I think people in Japan should be able to use printf by using プリントフ
There is way to much "english" in the way computers are being used.
It is similar to Anthropomorphizing animal behavior.