Liste des Groupes | Revenir à cl awk |
porkchop@invalid.foo (Mike Sanders) writes:
>Hi folks, hope you all are doing well.>
>
Please excuse long post, wanted to share this, some might find
it handy given a certain context. Must run, I'm very behind in
my work (hey I'm always running behind!)
Using a word list, I found some odd matches. For example:
>
$ echo "drunkeness indigestion" | awk -f metaphone.awk -v find=texas
drunkeness
indigestion
>
Are these really metaphone matches for "texas"? It's possible (I don't
know the algorithm at all well) but I found it surprising.
# metaphone.awk: Michael Sanders - 2024>
#
# example invocation:
#
# echo "texas taxes taxi" | awk -f metaphone.awk -v find=texas
#
# notes:
#
# ever notice when you search for (say):
#
# 'i went to the zu'
#
# & your chosen search engine suggests something like:
#
# 'did you mean i went to the zoo'
#
# the metaphone algorithm handles such cases pretty well actually...
#
# Metaphone is a phonetic algorithm, published by Lawrence Philips in
# 1990, for indexing words by their English pronunciation. It
# fundamentally improves on the Soundex algorithm by using information
# about variations and inconsistencies in English spelling and
# pronunciation to produce a more accurate encoding, which does a
# better job of matching words and names which sound similar.
# https://en.wikipedia.org/wiki/Metaphone
#
# english only (sorry)
#
# not extensively tested, nevertheless a solid start, if you
# improve this code please share your results
#
# other implentations...
#
# gist: https://gist.github.com/Rostepher/b688f709587ac145a0b3
#
# BASIC: http://aspell.net/metaphone/metaphone.basic
#
# C: http://aspell.net/metaphone/metaphone-kuhn.txt
I wanted a "reference" implementation I could try, but this is not a
useful C program. It's in a odd dialect (it uses void but has K&R
function definitions) and has loads of undefined behaviours (strcpy of
overlapping strings, use if uninitialised variables etc).
Les messages affichés proviennent d'usenet.