Sujet : Re: From JoyceUlysses.txt -- words occurring exactly once
De : grant.b.edwards (at) *nospam* gmail.com (Grant Edwards)
Groupes : comp.lang.pythonDate : 03. Jun 2024, 20:58:26
Autres entêtes
Message-ID : <mailman.83.1717441107.2909.python-list@python.org>
References : 1 2 3 4 5 6
User-Agent : slrn/1.0.3 (Linux)
On 2024-06-03, Edward Teach via Python-list <
python-list@python.org> wrote:
The Gutenburg Project publishes "plain text". That's another
problem, because "plain text" means UTF-8....and that means
unicode...and that means running some sort of unicode-to-ascii
conversion in order to get something like "words". A couple of
hours....a couple of hundred lines of C....problem solved!
I'm curious. Why does it need to be converted frum Unicode to ASCII?
When you read it into Python, it gets converted right back to Unicode...