Liste des Groupes | Revenir à cl c |
On 25/05/2024 19:12, Tim Rentsch wrote:
>bart <bc@freeuk.com> writes:>
>On 25/05/2024 10:12, Tim Rentsch wrote:>
>bart <bc@freeuk.com> writes:>
[...]
It looks like your hash function was tuned for this testing
setup. With different choices for testing it does much
worse.
It wasn't tuned at all; it was a copy of what I've longed used
within my lexers.
>
It is designed to give a good spread when inputs are similar or
change sequentially. (Well, not exactly designed; more trial
and error.)
The trial and error process you mention had the effect of tuning
your function for the testing setup used, whether you meant it to
or not.
>
>>The testing is done with malloc() blocks all of the same>
requested size, and that size is 1. Typical workloads
are likely to be both larger and more variable.
>
When adding a new entry finds a collision with an old
entry, linear probing is used to find a free slot. It's
well understood that linear probing suffers badly as the
load factor goes up. Better to take a few high bits of
the hash value -- as few as five or six is fine -- to
have the reprobe sequences have different strides.
>
Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier.
You mean Malcolm's version? That also uses a loop. Neither
need to process all 8 bytes of a 64-bit pointer; 6 will do, and
probably just 4.
And such a loop can be trivially unrolled.
Both your function and the function posted by Malcolm are slow.
It's just that yours is slower.
With unoptimised code, mine is up to 50% faster (suggesting an
inherently faster method).
>
Optimised via gcc -O3, it's 17% slower.
If I duplicate the tests in my language, which has a meagre optimiser,
then mine is 3 times the speed (I haven't looked at why; something
about the MM code that keeps locals out of registers I guess):
>
Unopt gcc-O3 My lang opt (uses 64 bits)
>
MM hash 2.7 0.34 1.9 seconds
BC hash 1.7 0.4 0.6
>
There are anyway any number of tweaks that can be made; no need to do
all 64 bits for a start, and the loops can be manually
unrolled. Suggesting it is slower is premature.
It isn't hard to produce a>
hash function of equal quality that runs more than twice as
fast.
With or without the aid of an optimising compiler? You can't always
tell with the latter whether it's you who's clever, or the compiler.
Let me tell you a little secret. Hash functions are like random>
number generators: good ones never do exceptionally well or
exceptionally poorly. The reason for this is simple - whenever
there are scenarios where one does well then inevitably there are
other scenarios where it does poorly. > The ones
that are left are all, with high probability, just fine no matter
what application they are used in. That is not to say there is
no need to test within the intended application, it's always good
to do a sanity check, but the point of the sanity check is just
to make sure the earlier process didn't miss something; it isn't
the right way to compare alternatives.
So what's a good one for use within the identifier lookup of a
compiler's lexer?
Les messages affichés proviennent d'usenet.