Newsportal USENET - Chardet oddity

Sujet : Chardet oddity
De : sjeik_appie (at) *nospam* hotmail.com (Albert-Jan Roskam)
Groupes : comp.lang.python
Date : 23. Oct 2024, 18:07:14

Autres entêtes

Message-ID : <mailman.31.1729703240.4695.python-list@python.org>
References : 1

   Today I used chardet.detect in the repl and it returned windows-1252
   (incorrect, because it later resulted in a UnicodeDecodeError). When I ran
   chardet as a script (which uses UniversalLineDetector) this returned
   MacRoman. Isn't charset.detect the correct way? I've used this method many
   times.
   # Interpreter
   >>> contents = open(FILENAME, "rb").read()
   >>> chardet.detect(content)
   {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language':
   ''}
   # Terminal
   $ python -m chardet FILENAME
   FILENAME: MacRoman with confidence 0.7167379080370483
   Thanks!
   Albert-Jan

Date	Sujet	#	Auteur
23 Oct 24	Chardet oddity	3	Albert-Jan Roskam
23 Oct 24	Re: Chardet oddity	1	Stefan Ram
23 Oct 24	Re: Chardet oddity	1	Mark Bourne