Sujet : Re: From JoyceUlysses.txt -- words occurring exactly once
De : No_spamming (at) *nospam* noWhere_7073.org (B. Pym)
Groupes : comp.lang.lisp comp.lang.schemeDate : 31. May 2024, 12:13:50
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v3c7st$26biv$1@dont-email.me>
References : 1
User-Agent : XanaNews/1.18.1.6
On 5/30/2024, HenHanna wrote:
i'd not use Gauche for this, but maybe someone can change my mind.
_______________________
From JoyceUlysses.txt -- words occurring exactly once
Given a text file of a novel (JoyceUlysses.txt) ...
could someone give me a pretty fast (and simple) program that'd give me a list of all words occurring exactly once?
-- Also, a list of words occurring once, twice or 3 times
re: hyphenated words (you can treat it anyway you like)
ideally, i'd treat [editor-in-chief]
[go-ahead] [pen-knife]
[know-how] [far-fetched] ...
as one unit.
Gauche Scheme
(use file.util) ;; file->string
(use srfi-13) ;; character sets
(use srfi-14) ;; string-tokenize
(define h (make-hash-table 'string=?))
(dolist
(s
(string-tokenize (file->string "Alice.txt")
(char-set-adjoin char-set:letter #\-)))
(hash-table-update! h
(regexp-replace* (string-upcase s) #/^-+/ "" #/-+$/ "")
(pa$ + 1) 0))
(filter (lambda(kv) (< (cdr kv) 3))
(hash-table->alist h))
===>
(("LASTED" . 2) ("WAY--NEVER" . 1) ("VISIT" . 1) ("CHANCED" . 1)
("WILDLY" . 2) ("BEHEAD" . 1) ("PROMISE" . 1) ("MEANWHILE" . 1)
("ENGAGED" . 1) ("KNIFE" . 2) ("ROARED" . 1) ("RETIRE" . 1)
("BLACKING" . 1) ("HATED" . 1) ("BRIGHT-EYED" . 1)
("SHEEP-BELLS" . 1) ("PROTECTION" . 1) ("CRIES" . 1) ("ADA" . 1)
("ENJOY" . 1) ("WRITHING" . 1) ("RAW" . 1) ("APPEALED" . 1)
("RELIEVED" . 1) ("CHILDHOOD" . 1) ("WEPT" . 1) ("RACE-COURSE" . 1)
("THEIRS" . 1) ("MAD--AT" . 1) ("SPOKEN" . 1) ("PENCILS" . 1)
("CLEAR" . 2) ("TREADING" . 2) ("RETURNED" . 2) ("CHERRY-TART" . 1)
("UNEASY" . 1) ("LOW-SPIRITED" . 1) ("BONE" . 1) ("PROMISED" . 1)
("HAPPENING" . 1) ("OYSTER" . 1) ("PATIENTLY" . 2) ("NEEDS" . 1)
("LESSON-BOOK" . 1) ("PITIED" . 1) ("UNCOMFORTABLY" . 1)
("ANTIPATHIES" . 1) ("PICTURED" . 1) ("DESPERATE" . 1)
("ENGRAVED" . 1)
...
)