Re: (Long post) Metaphone Algorithm In AWK

Liste des GroupesRevenir à cl awk 
Sujet : Re: (Long post) Metaphone Algorithm In AWK
De : ben (at) *nospam* bsb.me.uk (Ben Bacarisse)
Groupes : comp.lang.awk
Date : 19. Aug 2024, 00:46:46
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <878qwts8bd.fsf@bsb.me.uk>
References : 1
User-Agent : Gnus/5.13 (Gnus v5.13)
porkchop@invalid.foo (Mike Sanders) writes:

Hi folks, hope you all are doing well.
>
Please excuse long post, wanted to share this, some might find
it handy given a certain context. Must run, I'm very behind in
my work (hey I'm always running behind!)

Using a word list, I found some odd matches.  For example:

$ echo "drunkeness indigestion" | awk -f metaphone.awk -v find=texas
drunkeness
indigestion

Are these really metaphone matches for "texas"?  It's possible (I don't
know the algorithm at all well) but I found it surprising.

# metaphone.awk: Michael Sanders - 2024
#
# example invocation:
#
# echo "texas taxes taxi" | awk -f metaphone.awk -v find=texas
#
# notes:
#
# ever notice when you search for (say):
#
# 'i went to the zu'
#
# & your chosen search engine suggests something like:
#
# 'did you mean i went to the zoo'
#
# the metaphone algorithm handles such cases pretty well actually...
#
# Metaphone  is a phonetic algorithm, published by Lawrence Philips in
# 1990,   for  indexing  words  by  their  English  pronunciation.  It
# fundamentally improves on the Soundex algorithm by using information
# about   variations  and  inconsistencies  in  English  spelling  and
# pronunciation  to  produce  a  more  accurate encoding, which does a
# better job of matching words and names which sound similar.
# https://en.wikipedia.org/wiki/Metaphone
#
# english only (sorry)
#
# not extensively tested, nevertheless a solid start, if you
# improve this code please share your results
#
# other implentations...
#
# gist:  https://gist.github.com/Rostepher/b688f709587ac145a0b3
#
# BASIC: http://aspell.net/metaphone/metaphone.basic
#
# C:     http://aspell.net/metaphone/metaphone-kuhn.txt

I wanted a "reference" implementation I could try, but this is not a
useful C program.  It's in a odd dialect (it uses void but has K&R
function definitions) and has loads of undefined behaviours (strcpy of
overlapping strings, use if uninitialised variables etc).

# check if a character is a vowel
function isvowel(c, is_vowel) {
  is_vowel = c ~ /[AEIOU]/
  return is_vowel
}

I was not going to comment on the code, but this hit me just before I
posted.  Given the odd way AWK functions have to define locals, I tend
to use them only when really needed.  Here I think I would just write

function isvowel(c) {
   return c ~ /[AEIOU]/
}

--
Ben.

Date Sujet#  Auteur
17 Aug 24 * (Long post) Metaphone Algorithm In AWK17Mike Sanders
19 Aug 24 +* Re: (Long post) Metaphone Algorithm In AWK10Ben Bacarisse
19 Aug 24 i+- Re: (Long post) Metaphone Algorithm In AWK1Ben Bacarisse
19 Aug 24 i+* Re: (Long post) Metaphone Algorithm In AWK2Mike Sanders
19 Aug 24 ii`- Re: (Long post) Metaphone Algorithm In AWK1Mike Sanders
20 Aug 24 i`* Re: (Long post) Metaphone Algorithm In AWK6Mike Sanders
21 Aug 24 i `* Re: (Long post) Metaphone Algorithm In AWK5Ben Bacarisse
21 Aug 24 i  `* Re: (Long post) Metaphone Algorithm In AWK4Mike Sanders
21 Aug 24 i   +- Re: (Long post) Metaphone Algorithm In AWK1Mike Sanders
21 Aug 24 i   `* Re: (Long post) Metaphone Algorithm In AWK2Ben Bacarisse
21 Aug 24 i    `- Re: (Long post) Metaphone Algorithm In AWK1Mike Sanders
20 Aug 24 +- Re: (Long post) Metaphone Algorithm In AWK1Mike Sanders
21 Aug 24 +* Re: (Long post) Metaphone Algorithm In AWK3Mike Sanders
21 Aug 24 i`* AWK language trivia (Was: (Long post) Metaphone Algorithm In AWK)2Kenny McCormack
21 Aug 24 i `- Re: AWK language trivia1Mike Sanders
21 Aug 24 +- Re: (Long post) Metaphone Algorithm In AWK1Mike Sanders
23 Aug 24 `- Re: (Long post) Metaphone Algorithm In AWK1Mike Sanders

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal