Tcl 8.6 vs 9.0 encoding plus some general confusion

Liste des GroupesRevenir à cl tcl 
Sujet : Tcl 8.6 vs 9.0 encoding plus some general confusion
De : (at) *nospam* ednolan (ted@loft.tnolan.com (Ted Nolan)
Groupes : comp.lang.tcl
Date : 22. Jun 2025, 21:05:43
Autres entêtes
Organisation : loft
Message-ID : <mbr60nF6lf9U1@mid.individual.net>
User-Agent : trn 4.0-test76 (Apr 2, 2001)
I am always finding things that, in retrospect, I don't understand
as well as I thought I did.  Perhaps someone can help me get my
mind around the following...

I was surprised recently that one of the servers in our
cluster is running a Linux distribution so forward-looking
that tclsh is symlinked to tclsh9.0 instead of tclsh8.6.

This caused one of my scripts to fail (or to pitch a warning,
which had the same effect in context) about encoding while
reading data.

As background, I am reading line oriented data that comes in as a
number of fields separated by a field separator character (backslash
to be specific).  Everything up to the last field is pure text data.
However data after the last separator can be binary data (with the
caveat that some light encoding is done such that it will not have
a newline character until the actual end of the record).

For tcl8.6, I have been setting up to read this data with something
like the following.  (I can't give the actual code here,
so bear with any typos):

set f [open $file r]
fconfigure $f -encoding binary -translation binary

while {[gets $f line] >= 0} {
do_stuff $line
}
close $f

The direction on the fconfigure man page for 8.6 is:

If a file contains pure binary data (for instance, a JPEG
image), the encoding for the channel should be configured to be
binary.  Tcl will then assign no interpretation to the data in
the file and simply read or write raw bytes.  The Tcl binary
command can be used to manipulate this byte-oriented data.  It
is usually better to set the -translation option to binary when
you want to transfer binary data, as this turns off the other
automatic interpretations of the bytes in the stream as well.

My understanding was that all this indicates to Tcl that we are
creating a byte array and it should not attempt to convert the
data to the internal Unicode format.

However the warning thrown by 9.0 points me to the "chan"
man page which says for the "-encoding" option:

If a file contains pure binary data (for instance, a JPEG
image), the encoding for the channel should be configured
to be iso8859-1. Tcl will then assign no interpretation to
the data in the file and simply read or write raw bytes.
The Tcl binary command can be used to manipulate this
byte-oriented data. It is usually better to set the
-translation option to binary when you want to transfer
binary data, as this turns off the other automatic
interpretations of the bytes in the stream as well.

and I don't understand this at all.  If I say "-encoding iso8859-1",
am I not saying that the data is textual, and that Tcl should parse
it from "iso8859-1" into the internal Unicode as it reads it?

Also for both 8.6 & 9.0, when I search for the separator with "string
index", and pull off the last field with "string range", am I forcing
Tcl to consider the whole string as text such that the (potentially)
binary portion at the end of the line is attempted to be converted
to internal Unicode?  I have never observed this, but thinking
harder about it, maybe it should be, could be?  Should I be touching
the string only with "binary" commands?

Big thanks for clearing up my thinking about this!
--
columbiaclosings.com
What's not in Columbia anymore..

Date Sujet#  Auteur
22 Jun 25 * Tcl 8.6 vs 9.0 encoding plus some general confusion6ted@loft.tnolan.com (Ted Nolan
23 Jun 25 `* Re: Tcl 8.6 vs 9.0 encoding plus some general confusion5Ralf Fassel
23 Jun 25  `* Re: Tcl 8.6 vs 9.0 encoding plus some general confusion4Rich
23 Jun 25   +* Re: Tcl 8.6 vs 9.0 encoding plus some general confusion2Ralf Fassel
24 Jun 25   i`- Re: Tcl 8.6 vs 9.0 encoding plus some general confusion1Harald Oehlmann
24 Jun 25   `- Re: Tcl 8.6 vs 9.0 encoding plus some general confusion1Harald Oehlmann

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal