Sujet : Re: Tcl9: source files are interpreted as utf-8 by default
De : rich (at) *nospam* example.invalid (Rich)
Groupes : comp.lang.tclDate : 08. Jan 2025, 20:32:24
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vlmjs8$2tu2l$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10
User-Agent : tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64))
Luc <
luc@sep.invalid> wrote:
On Wed, 8 Jan 2025 17:04:26 -0000 (UTC), Rich wrote:
situation. If the script is iso-8859 encoded, but Tcl's default
parsing reads it as UTF-8, then all of the iso-8859 characters
inside are already corrupted *before* even the first command in the
script is executed. So there's no way to "source" a
"set_encoding.tcl" /in the main script itself/, that would adjust
the encoding before the main script is parsed using the wrong
encoding.
>
**************************
I see.
How about trading places?
Instead of main.tcl sourcing set_encoding.tcl, starter.tcl runs some
'encoding' command then sources main.tcl. Basically, a wrapper.
Yes, that works. But then Uwe has to go and "wrapperize" all the
various scripts, on all the various client systems. So he's back in
the same boat of "major modifications need be made now" as changing all
the launching instances to launch with "-encoding iso-8859".
Another option is to run 'iconv' recursively on all those source files.
I've resisted pointing this one out, but long term, yes, updating all
the scripts to be utf-8 encoded is the right, long term, answer. But
that belies all the current, short term effort, involved in doing so.
I did something like that some 15 years ago. But my case involved a
migration. I had a ton of legacy iso-8859 files on a system-wide
utf-8 Linux system. That caused me problems too, but iconv fixed it.
In my case, I used the \uxxxx escapes for anything that was not plain
ASCII, so all my scripts are both "basic 8859" and "utf-8" at the same
time, and having Tcl 9 source them as utf-8 won't cause an issue. But
it sounds like Uwe directly entered the extended 8859 characters into
the scripts. Which very well may have made perfect sense if he had
more than one or two of them per script.