Re: (Long post) Metaphone Algorithm In AWK

Liste des GroupesRevenir à cl awk 
Sujet : Re: (Long post) Metaphone Algorithm In AWK
De : ben (at) *nospam* bsb.me.uk (Ben Bacarisse)
Groupes : comp.lang.awk
Date : 19. Aug 2024, 02:15:43
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <87r0alqpmo.fsf@bsb.me.uk>
References : 1 2
User-Agent : Gnus/5.13 (Gnus v5.13)
A correction...

Ben Bacarisse <ben@bsb.me.uk> writes:

porkchop@invalid.foo (Mike Sanders) writes:
>
Hi folks, hope you all are doing well.
>
Please excuse long post, wanted to share this, some might find
it handy given a certain context. Must run, I'm very behind in
my work (hey I'm always running behind!)
>
Using a word list, I found some odd matches.  For example:
>
$ echo "drunkeness indigestion" | awk -f metaphone.awk -v find=texas
drunkeness
indigestion
>
Are these really metaphone matches for "texas"?  It's possible (I don't
know the algorithm at all well) but I found it surprising.

I got the C code to compile and these should not match if the C code is
working correctly.

# metaphone.awk: Michael Sanders - 2024
#
# example invocation:
#
# echo "texas taxes taxi" | awk -f metaphone.awk -v find=texas
#
# notes:
#
# ever notice when you search for (say):
#
# 'i went to the zu'
#
# & your chosen search engine suggests something like:
#
# 'did you mean i went to the zoo'
#
# the metaphone algorithm handles such cases pretty well actually...
#
# Metaphone  is a phonetic algorithm, published by Lawrence Philips in
# 1990,   for  indexing  words  by  their  English  pronunciation.  It
# fundamentally improves on the Soundex algorithm by using information
# about   variations  and  inconsistencies  in  English  spelling  and
# pronunciation  to  produce  a  more  accurate encoding, which does a
# better job of matching words and names which sound similar.
# https://en.wikipedia.org/wiki/Metaphone
#
# english only (sorry)
#
# not extensively tested, nevertheless a solid start, if you
# improve this code please share your results
#
# other implentations...
#
# gist:  https://gist.github.com/Rostepher/b688f709587ac145a0b3
#
# BASIC: http://aspell.net/metaphone/metaphone.basic
#
# C:     http://aspell.net/metaphone/metaphone-kuhn.txt
>
I wanted a "reference" implementation I could try, but this is not a
useful C program.  It's in a odd dialect (it uses void but has K&R
function definitions) and has loads of undefined behaviours (strcpy of
overlapping strings, use if uninitialised variables etc).

The uninitialised variables were due to an undefined function.  Most
likely, that function was intended to initialise the array.  I've mocked
up the two undefined functions and can now get the code to run.  I don't
see any uninitialised variables being used now.  The code still has
undefined behaviour in some cases but I think that is limited to the use
of strcpy.

--
Ben.

Date Sujet#  Auteur
17 Aug 24 * (Long post) Metaphone Algorithm In AWK17Mike Sanders
19 Aug 24 +* Re: (Long post) Metaphone Algorithm In AWK10Ben Bacarisse
19 Aug 24 i+- Re: (Long post) Metaphone Algorithm In AWK1Ben Bacarisse
19 Aug 24 i+* Re: (Long post) Metaphone Algorithm In AWK2Mike Sanders
19 Aug 24 ii`- Re: (Long post) Metaphone Algorithm In AWK1Mike Sanders
20 Aug 24 i`* Re: (Long post) Metaphone Algorithm In AWK6Mike Sanders
21 Aug 24 i `* Re: (Long post) Metaphone Algorithm In AWK5Ben Bacarisse
21 Aug 24 i  `* Re: (Long post) Metaphone Algorithm In AWK4Mike Sanders
21 Aug 24 i   +- Re: (Long post) Metaphone Algorithm In AWK1Mike Sanders
21 Aug 24 i   `* Re: (Long post) Metaphone Algorithm In AWK2Ben Bacarisse
21 Aug 24 i    `- Re: (Long post) Metaphone Algorithm In AWK1Mike Sanders
20 Aug 24 +- Re: (Long post) Metaphone Algorithm In AWK1Mike Sanders
21 Aug 24 +* Re: (Long post) Metaphone Algorithm In AWK3Mike Sanders
21 Aug 24 i`* AWK language trivia (Was: (Long post) Metaphone Algorithm In AWK)2Kenny McCormack
21 Aug 24 i `- Re: AWK language trivia1Mike Sanders
21 Aug 24 +- Re: (Long post) Metaphone Algorithm In AWK1Mike Sanders
23 Aug 24 `- Re: (Long post) Metaphone Algorithm In AWK1Mike Sanders

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal