Sujet : Re: From JoyceUlysses.txt -- words occurring exactly once
De : HenHanna (at) *nospam* devnull.tb (HenHanna)
Groupes : comp.lang.pythonDate : 31. May 2024, 04:26:37
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v3bcgu$229eq$1@dont-email.me>
References : 1 2 3
User-Agent : Mozilla Thunderbird
On 5/30/2024 2:18 PM, dn wrote:
On 31/05/24 08:03, HenHanna via Python-list wrote:
>
Given a text file of a novel (JoyceUlysses.txt) ...
>
could someone give me a pretty fast (and simple) Python program that'd give me a list of all words occurring exactly once?
>
-- Also, a list of words occurring once, twice or 3 times
>
>
>
re: hyphenated words (you can treat it anyway you like)
>
but ideally, i'd treat [editor-in-chief]
[go-ahead] [pen-knife]
[know-how] [far-fetched] ...
as one unit.
Split into words - defined as you will.
Use Counter.
Show some (of your) code and we'll be happy to critique...
hard to decide what to do with hyphens
and apostrophes
(I'd, he's, can't, haven't, A's and B's)
2-step-Process
1. make a file listing all words (one word per line)
2. then, doing the counting. using
from collections import Counter
Related code (for 1) that i'd used before:
Rfile = open("JoyceUlysses.txt", 'r')
with open( 'Out.txt', 'w' ) as fo:
for line in Rfile:
line = line.rstrip()
wLis = line.split()
for w in wLis:
if w != "":
w = w.rstrip(";:,'\"[]()*&^%$#@!,./<>?_-+=")
w = w.lstrip(";:,'\"[]()*&^%$#@!,./<>?_-+=")
fo.write(w.lower())
fo.write('\n')