Newsportal USENET - Re: Chardet oddity

Sujet : Re: Chardet oddity
De : nntp.mbourne (at) *nospam* spamgourmet.com (Mark Bourne)
Groupes : comp.lang.python
Date : 23. Oct 2024, 20:42:00

Autres entêtes

Organisation : A noiseless patient Spider
Message-ID : <vfbjia$28es4$1@dont-email.me>
References : 1 2
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 SeaMonkey/2.53.19

Albert-Jan Roskam wrote:

Today I used chardet.detect in the repl and it returned windows-1252
(incorrect, because it later resulted in a UnicodeDecodeError). When I ran
chardet as a script (which uses UniversalLineDetector) this returned
MacRoman. Isn't charset.detect the correct way? I've used this method many
times.
# Interpreter
>>> contents = open(FILENAME, "rb").read()
>>> chardet.detect(content)

Is that copy and pasted from the terminal, or retyped with possible transcription errors? As written, you've assigned the open file handle to `contents`, but passed `content` (with no "s") to `chardet.detect` - so the result would depend on whatever was previously assigned to `content`.

{'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language':
''}
# Terminal
$ python -m chardet FILENAME
FILENAME: MacRoman with confidence 0.7167379080370483
Thanks!
Albert-Jan

--
Mark.

Date	Sujet	#	Auteur
23 Oct 24	Chardet oddity	3	Albert-Jan Roskam
23 Oct 24	Re: Chardet oddity	1	Stefan Ram
23 Oct 24	Re: Chardet oddity	1	Mark Bourne