Re: Chardet oddity

Liste des GroupesRevenir à cl python 
Sujet : Re: Chardet oddity
De : nntp.mbourne (at) *nospam* spamgourmet.com (Mark Bourne)
Groupes : comp.lang.python
Date : 23. Oct 2024, 21:42:00
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vfbjia$28es4$1@dont-email.me>
References : 1 2
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 SeaMonkey/2.53.19
Albert-Jan Roskam wrote:
    Today I used chardet.detect in the repl and it returned windows-1252
    (incorrect, because it later resulted in a UnicodeDecodeError). When I ran
    chardet as a script (which uses UniversalLineDetector) this returned
    MacRoman. Isn't charset.detect the correct way? I've used this method many
    times.
    # Interpreter
    >>> contents = open(FILENAME, "rb").read()
    >>> chardet.detect(content)
Is that copy and pasted from the terminal, or retyped with possible transcription errors?  As written, you've assigned the open file handle to `contents`, but passed `content` (with no "s") to `chardet.detect` - so the result would depend on whatever was previously assigned to `content`.

    {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language':
    ''}
    # Terminal
    $ python -m chardet FILENAME
    FILENAME: MacRoman with confidence 0.7167379080370483
    Thanks!
    Albert-Jan
--
Mark.

Date Sujet#  Auteur
23 Oct 24 * Chardet oddity3Albert-Jan Roskam
23 Oct 24 +- Re: Chardet oddity1Stefan Ram
23 Oct 24 `- Re: Chardet oddity1Mark Bourne

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal