Sujet : Re: From JoyceUlysses.txt -- words occurring exactly once
De : list1 (at) *nospam* tompassin.net (Thomas Passin)
Groupes : comp.lang.pythonDate : 31. May 2024, 23:27:00
Autres entêtes
Message-ID : <mailman.77.1717199313.2909.python-list@python.org>
References : 1 2
User-Agent : Mozilla Thunderbird
On 5/30/2024 4:03 PM, HenHanna via Python-list wrote:
Given a text file of a novel (JoyceUlysses.txt) ...
could someone give me a pretty fast (and simple) Python program that'd give me a list of all words occurring exactly once?
-- Also, a list of words occurring once, twice or 3 times
re: hyphenated words (you can treat it anyway you like)
but ideally, i'd treat [editor-in-chief]
[go-ahead] [pen-knife]
[know-how] [far-fetched] ...
as one unit.
You will probably get a thousand different suggestions, but here's a fairly direct and readable one in Python:
s1 = 'Is this word is the only word repeated in this string'
counts = {}
for w in s1.lower().split():
counts[w] = counts.get(w, 0) + 1
print(sorted(counts.items()))
# [('in', 1), ('is', 2), ('only', 1), ('repeated', 1), ('string', 1), ('the', 1), ('this', 2), ('word', 2)]
Of course you can adjust the definition of what constitutes a word, handle punctuation and so on, and tinker with the output format to suit yourself. You would replace s1.lower().split() with, e.g., my_custom_word_splitter(s1).