Sujet : Re: From JoyceUlysses.txt -- words occurring exactly once
De : jbb (at) *nospam* notatt.com (Jeff Barnett)
Groupes : comp.lang.lisp comp.lang.schemeDate : 31. May 2024, 00:33:30
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v3aus4$1sknf$1@dont-email.me>
References : 1
User-Agent : Mozilla Thunderbird
On 5/30/2024 2:09 PM, HenHanna wrote:
i'd not use Gauche for this, but maybe someone can change my mind.
_______________________
From JoyceUlysses.txt -- words occurring exactly once
Given a text file of a novel (JoyceUlysses.txt) ...
could someone give me a pretty fast (and simple) program that'd give me
a list of all words occurring exactly once?
-- Also, a list of words occurring once, twice or 3 times
re: hyphenated words (you can treat it anyway you like)
ideally, i'd treat [editor-in-chief]
[go-ahead] [pen-knife]
[know-how] [far-fetched] ...
as one unit.
Make a list (or array) of the individual words (as strings or symbols in
a special package) of the original document then sort the list using the
Lisp-supplied sort function. You than write a loop using your favorite
tools and look for interior sequences of the required length. This gives
you a program that is asymptotically efficient as the theoretical
run-time will look something like (* c N (log N)), where N is the length
of the list produced by the first step and c is some constant.
Note, any solution resembling this one is not really what you want. For
example it would think "Snark" and "Snarks" are different words. Some
differences such as capitalization can be suppressed by choosing a sort
predicate that is case insensitive. You can, of course, write your own
sort predicate. The thing to note is that the predicate (the <= operator
used by sort) will not access the words or maintain state between
invocations; otherwise, the complexity can become arbitrarily large.
-- Jeff Barnett