Liste des Groupes | Revenir à cl c |
On 25/05/2024 10:12, Tim Rentsch wrote:
>bart <bc@freeuk.com> writes:>
[...]
It looks like your hash function was tuned for this testing
setup. With different choices for testing it does much
worse.
It wasn't tuned at all; it was a copy of what I've longed used
within my lexers.
>
It is designed to give a good spread when inputs are similar or
change sequentially. (Well, not exactly designed; more trial
and error.)
The testing is done with malloc() blocks all of the same>
requested size, and that size is 1. Typical workloads
are likely to be both larger and more variable.
>
When adding a new entry finds a collision with an old
entry, linear probing is used to find a free slot. It's
well understood that linear probing suffers badly as the
load factor goes up. Better to take a few high bits of
the hash value -- as few as five or six is fine -- to
have the reprobe sequences have different strides.
>
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier.
You mean Malcolm's version? That also uses a loop. Neither
need to process all 8 bytes of a 64-bit pointer; 6 will do, and
probably just 4.
And such a loop can be trivially unrolled.
In a case like>
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.
My first thought in this thread was to just make use of the
pointer value directly. My version was written only to compare
against Malcolm's FNV code.
>
If I try the other suggestions, RH's and KK's which simply
shift the pointer value (similar to my idea), the results I got
were interesting: sometimes there were zero clashes (almost as
though bits 4 to 19 of the pointer were a perfect hash), but
sometimes there were millions.
>
But when I tried to mix up the input values, then all methods
seemed to converge to similar results.
>
In that case you might as well use the simplest and fastest
method.
>
However it really needs testing within the actual application.
Les messages affichés proviennent d'usenet.