Newsportal USENET - Re: Tcl9: source files are interpreted as utf-8 by default

Rich,
at first, thank you very much for explaining my situation very well.
I couldn't have argued better ;-)
Let me add a note on why characters outside the iso7-bit range
cannot always be replaced by the \uXXXX notation:
Comments.
If you like to write comments in your native language it
should be no very readable to code e.g. german umlauts as \uXXXX.
Especially if you extract the program documentation out of from
the source code in a kind of “literate programming” (which I often
do), the use of \u notation is very cumbersome.
Best wishes
Uwe
Am 08.01.2025 um 23:53 schrieb Rich:

Luc <luc@sep.invalid> wrote:
On Wed, 8 Jan 2025 19:32:24 -0000 (UTC), Rich wrote:
>
Instead of main.tcl sourcing set_encoding.tcl, starter.tcl runs some
'encoding' command then sources main.tcl. Basically, a wrapper.
>
Yes, that works. But then Uwe has to go and "wrapperize" all the
various scripts, on all the various client systems. So he's back in
the same boat of "major modifications need be made now" as changing all
the launching instances to launch with "-encoding iso-8859".
>
True, but he has considered that kind of effort. His words:
>
>
"That means we have to add "-encoding iso8859-1"
to ALL source and ALL tclsh calls in ALL scripts.
So far, so good(or bad?)."
>
"What initially seems quite doable, looks more and more scary
to me. First, if we ever may switch encoding to utf-8 we
have to alter all those lines again."
>
>
So in my mind, the "customer" accepts (though grudgingly) making
large scale changes, but is concerned with possible new changes
in the future. A wrapper can handle the future quite gracefully.
Uwe's reality is likely that at some point a "mass migration" may very
well have to be done. There's at least two possibilities:
1) Tcl9 remains as it is today, loading all scripts as UTF-8 unless
told otherwise by a user provided option. Either all iso-8859 scripts
have to be modifed to become:
   1a) UTF-8 encoded;
   1b) modified pass the -encoding parameter to [source]
   1c) a wrapper deployed that 'adjusts' things such that the main
   script, and all sourced scripts use -encoding to source as iso-8859
All appear to be substantial work based on Uwe's statements so far, and
all have a risk of overlooking one or more that should have been
modified.
2) Tcl9 patch X reverts to using "system encoding" (and the user's of
these scripts are on systems where "system encoding" is presently
returning iso-8859). So things work again, with no changes, for the
moment. But then Windows version 1Y.Z changes things such that it now
uses a system encoding of UTF-8. Suddenly, the same problem from 1
returns unless the user's have the abilty to adjust their system
encoding back (and if 'system encoding' is an "administrator
controlled" setting these users, then this option is not available).
So my two cents, for what it is worth, given that I suspect this change
will eventually 'force itself' no matter what Tcl9 patch level X might
do, would be to begin the process of migrating all of these scripts to
UTF-8 encoding. It will be hard, but once done, it likely will be
stable again for the future.

I've resisted pointing this one out, but long term, yes, updating all
the scripts to be utf-8 encoded is the right, long term, answer. But
that belies all the current, short term effort, involved in doing so.
>
Actually, when I mentioned my migration case, I was also thinking that
I could afford to do it because I was migrating to Linux and utf-8 was
not even the future anymore, it was pretty much the present. But maybe
running iconv wouldn't be acceptable because Uwe is (I assume) on
Windows.
From his posts on this thread, we can assume that his scripts are being
used on windows systems. That does not imply much about where Uwe
develops those same scripts. I have lots of my own scripts that I use
on $work's windows machine, but all of them are written on Linux.

Does a Windows user want to convert his files to utf-8?
The average/median windows user does not even know what UTF-8 means nor
why it is significant. They just expect that when the launch "icon X"
that expected program X appears, and that the text inside is as
expected. So it is much more likely the work/effort of "convert to
utf-8" will fall on Uwe, as it is very likely the windows users know
nothing of any of this (or if they 'know' anything, it is something
simple for them, such as: "set this selection box in this windows
config pane to say Y" and that ends their knowledge).

Won't that cause problems if the system is iso-8859-1?
Only if windows tries to interpret the UTF-8 data as iso-8859
characters. But as far as the Tcl scripts go, once the scripts are
UTF-8, and [source] is using UTF-8 to read them, the fact that windows
system might be iso-8859 is irrelivant.

Windows still uses iso-8859-1, right?
Honestly I have no idea. The *only* windows machine I use is $work's
windows machine, and the 'administrator' controls most of it so I can
only adjust things in a very narrow band (very irritating at times, but
their machine, their rules).

So yes, I guess Tcl9 causes trouble to 8859-1 users.
Only if they directly entered any codepoints that were beyond plain
ASCII. Code points 0 through 127 are identical between 8859 and UTF-8.
If the files used plain ASCII, and the \uXXXX escapes, there would be
no trouble at all. Of course if one is using a lot of non-English
characters for non-English languages, seeing the actual characters in
the scripts vs. walls of \u00b0 \u00a0 \u2324 everywhere makes for an
easier development effort.

Yes, sounds like it needs some fixing.
Agreed. Uwe may be able to put off the fixig for some more time, but
this change is going to arrive one day. He will likely have to make it
at some point.

I have my own special case, I use Debian 9 which only ships 8.6.6 so
I had to build 8.6.15 from source because I really need Unicode.
8.6.6 handled Unicode fine. In fact, 8.5 handled Unicode (so long as
one stuck to the BMP) just fine.

But for some time I used Freewrap as a single-file batteries included
Tcl/Tk interpreter. So maybe Uwe should just use a different interpreter,
likely just a slightly older version of Tcl/Tk and embrace Tcl9 later.
That is another option, a custom build that defaults to iso-8859.

I wonder if one can hack the encoding issue on the Tcl9 source and
rebuild it.
The answer is likely a "yes". But I've not looked at the code to know
that for sure. But this just feels like a "one line change" followed
by a recompile. But now one has to also deliver that custom runtime
as well as the scripts that go with it.

Date	Sujet	#	Auteur
13 Dec 24	Tcl9: source files are interpreted as utf-8 by default	40	Uwe Schmitz
13 Dec 24	Re: Tcl9: source files are interpreted as utf-8 by default	2	Harald Oehlmann
13 Dec 24	Re: Tcl9: source files are interpreted as utf-8 by default	1	Uwe Schmitz
7 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	37	Uwe Schmitz
7 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	5	Harald Oehlmann
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	4	Uwe Schmitz
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	3	Harald Oehlmann
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	2	Harald Oehlmann
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	1	Uwe Schmitz
7 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	31	Luc
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	30	Uwe Schmitz
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	29	Luc
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	28	Luc
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	27	Uwe Schmitz
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	26	Luc
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	25	Rich
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	24	Luc
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	23	Rich
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	12	Luc
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	1	ted@loft.tnolan.com (Ted Nolan
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	10	Rich
9 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	9	Uwe Schmitz
9 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	8	Rich
9 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	6	Luc
10 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	5	Rich
10 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	4	Luc
10 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	3	eric
10 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	2	Luc
10 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	1	Rich
10 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	1	Uwe Schmitz
8 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	10	saito
9 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	9	Rich
9 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	8	Luc
9 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	7	Rich
9 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	6	Luc
9 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	4	Uwe Schmitz
9 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	3	Harald Oehlmann
10 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	2	Uwe Schmitz
10 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	1	Harald Oehlmann
9 Jan 25	Re: Tcl9: source files are interpreted as utf-8 by default	1	Rich