Sujet : Re: From JoyceUlysses.txt -- words occurring exactly once
De : PythonList (at) *nospam* DancesWithMice.info (dn)
Groupes : comp.lang.pythonDate : 05. Jun 2024, 05:33:15
Autres entêtes
Organisation : DWM
Message-ID : <mailman.90.1717562014.2909.python-list@python.org>
References : 1 2 3 4 5
User-Agent : Mozilla Thunderbird
On 31/05/24 14:26, HenHanna via Python-list wrote:
On 5/30/2024 2:18 PM, dn wrote:
On 31/05/24 08:03, HenHanna via Python-list wrote:
>
Given a text file of a novel (JoyceUlysses.txt) ...
>
could someone give me a pretty fast (and simple) Python program that'd give me a list of all words occurring exactly once?
>
-- Also, a list of words occurring once, twice or 3 times
>
>
>
re: hyphenated words (you can treat it anyway you like)
>
but ideally, i'd treat [editor-in-chief]
[go-ahead] [pen-knife]
[know-how] [far-fetched] ...
as one unit.
>
Split into words - defined as you will.
Use Counter.
>
Show some (of your) code and we'll be happy to critique...
hard to decide what to do with hyphens
and apostrophes
(I'd, he's, can't, haven't, A's and B's)
2-step-Process
1. make a file listing all words (one word per line)
2. then, doing the counting. using
from collections import Counter
Apologies for lateness - only just able to come back to this.
This issue is not Python, and is not solved by code!
If you/your teacher can't define a "word", the code, any code, will almost-certainly be wrong!
One of the interesting aspects of our work is that we can write all manner of tests to try to ensure that the code is correct: unit tests, integration tests, system tests, acceptance tests, eye-tests, ...
However, there is no such thing as a test (or proof) that statements of requirements are complete or correct!
(nor for any other previous stages of the full project life-cycle)
As coders we need to learn to require clear specifications and not attempt to read-between-the-lines, use our initiative, or otherwise 'not bother the ...'. When there is ambiguity, we should go back to the user/client/boss and seek clarification. They are the domain/subject-matter experts...
I'm reminded of a cartoon, possibly from some IBM source, first seen in black-and-white but here in living-color:
https://www.monolithic.org/blogs/presidents-sphere/what-the-customer-really-wantsThat has been the sad history of programming and dev.projects - wherein we are blamed for every short-coming, because no-one else understands the nuances of development projects.
If we don't insist on clarity, are we our own worst enemy?
-- Regards,=dn