Sujet : Re: Tcl9: source files are interpreted as utf-8 by default
De : rich (at) *nospam* example.invalid (Rich)
Groupes : comp.lang.tclDate : 09. Jan 2025, 16:37:22
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vloqfi$3e04k$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64))
Uwe Schmitz <
schmitzu@mail.de> wrote:
Rich,
at first, thank you very much for explaining my situation very well.
I couldn't have argued better ;-)
Let me add a note on why characters outside the iso7-bit range
cannot always be replaced by the \uXXXX notation:
Comments.
If you like to write comments in your native language it
should be no very readable to code e.g. german umlauts as \uXXXX.
Especially if you extract the program documentation out of from
the source code in a kind of “literate programming” (which I often
do), the use of \u notation is very cumbersome.
This was my suspision. In my case, the non-ascii characters are not
part of the language (English in my case) script, they are extras (such
as arrows/lines or the degree symbol, etc.) and so the script is 99.9%
readable, with a few \uXXXX sometimes occurring.
But writing a string out where every third character is \uXXXX makes
for a very human unreadable string (be it a comment, or a string for
the code to use).
If you develop on Linux (or have a Linux machine available) you may
wish to begin experimenting with using iconv to convert some scripts to
UTF-8 encoding. If things work properly, it might be best to start
that conversion (even if you do it slowly over time) sooner rather than
later. It will be work, but it is work that you are likely going to
have to perform at some point anyway.