Sujet : Re: Decoding bytes to text strings in Python 2
De : rosuav (at) *nospam* gmail.com (Chris Angelico)
Groupes : comp.lang.pythonDate : 24. Jun 2024, 01:30:30
Autres entêtes
Message-ID : <mailman.162.1719185446.2909.python-list@python.org>
References : 1 2 3 4 5
On Mon, 24 Jun 2024 at 08:20, Rayner Lucas via Python-list
<
python-list@python.org> wrote:
>
In article <mailman.159.1718991773.2909.python-list@python.org>,
rosuav@gmail.com says...
>
If you switch to a Linux system, it should work correctly, and you'll
be able to migrate the rest of the way onto Python 3. Once you achieve
that, you'll be able to operate on Windows or Linux equivalently,
since Python 3 solved this problem. At least, I *think* it will; my
current system has a Python 2 installed, but doesn't have tkinter
(because I never bothered to install it), and it's no longer available
from the upstream Debian repos, so I only tested it in the console.
But the decoding certainly worked.
>
Thank you for the idea of trying it on a Linux system. I did so, and my
example code generated the error:
>
_tkinter.TclError: character U+1f40d is above the range (U+0000-U+FFFF)
allowed by Tcl
>
So it looks like the problem is ultimately due to a limitation of
Tcl/Tk.
Yep, that seems to be the case. Not sure if that's still true on a
more recent Python, but it does look like you won't get astral
characters in tkinter on the one you're using.
I'm still not sure why it doesn't give an error on Windows and
Because of the aforementioned weirdness of old (that is: pre-3.3)
Python versions on Windows. They were built to use a messy, buggy
hybrid of UCS-2 and UTF-16. Sometimes this got you around problems, or
at least masked them; but it wouldn't be reliable. That's why, in
Python 3.3, all that was fixed :)
instead either works (when UTF-8 encoding is specified) or converts the
out-of-range characters to ones it can display (when the encoding isn't
specified). But now I know what the root of the problem is, I can deal
with it appropriately (and my curiosity is at least partly satisfied).
Converting out-of-range characters is fairly straightforward, at least
as long as your Python interpreter is correctly built (so, Python 3,
or a Linux build of Python 2).
"".join(c if ord(c) < 65536 else "?" for c in text)
This has given me a much better understanding of what I need to do in
order to migrate to Python 3 and add proper support for non-ASCII
characters, so I'm very grateful for your help!
>
Excellent. Hopefully all this mess is just a transitional state and
you'll get to something that REALLY works, soon!
ChrisA