Re: Good hash for pointers

Liste des GroupesRevenir à l c 
Sujet : Re: Good hash for pointers
De : tr.17687 (at) *nospam* z991.linuxsc.com (Tim Rentsch)
Groupes : comp.lang.c
Date : 05. Jun 2024, 17:58:46
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <86a5jzmle1.fsf@linuxsc.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12
User-Agent : Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Michael S <already5chosen@yahoo.com> writes:

On Sun, 26 May 2024 10:20:55 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[..comments on a hash function that simply multiplies the key
 by a large prime (18446744073709551557) to give the hash
 value, and noting that tests have shown poor performance..]

18446744073709551557 is indeed very bad (too close to 2**64).
But I tried other prime and at least in my simple tests it performs
rather well.
May be my tests are to simplistic, I don't pretend to understand much
about it.
>
[lots of C code]
>
Compiled as:
gcc -s -Wall -O2 -march=native ptr_hash.c hash_bonita.c hash_aes4.c

I couldn't get the AES-based hash function to compile.  There
was a complaint about not being able to inline an 'always-inline'
function.  I played around with it for a while but didn't get
anywhere.

I did get your own hash function out and put it into my little
test rig.  There may be summary comments below.

Run as:
./a 100000 75000 80 1000
>
Result:
ref: 75000 keys, 100000 slots, 52799 occupied 17431 collisions.  [...]
uut: 75000 keys, 100000 slots, 53201 occupied 17139 collisions.  [...]

Both of these occupancy rates are close to the expected theoretical
average, which I calculated to be 52764.  If I had been more
ambitious I might have done some statistical sampling to get some
idea of the standard deviation, but I wasn't that energetic
today. :)

If we're trying to write a general hash function, it should work
well across a wide range of input key sets, and also for different
kinds of hash table management.  Let's look at the output side
first.  There are two independent axes for hash tables:  table
size, and direct mapping versus using rehashing.  For table size,
the two prominent choices are some prime in the appropriate range,
or else a power of two.  Clearly any size could be chosen, but
primes have the nice property that they divide up the space of hash
values very evenly, whereas powers of two allow a cheaper way of
reducing a hash value to the range of table indices (anding versus a
modulo operation).  A good hash function should work well with both.
The choice of direct mapping versus rehashing depends on expected
occupancy rates (which I am used to calling "load factor").  For a
load factor of 75 to 80 percent or less, generally rehashing is
better.  Direct mapping has the advantage that it can handle load
factors above 100%, at a cost of dealing with whatever subsidiary
data structure (usually linked lists) is used when two or more keys
map to the same index in the hash table.  (I expect you know all
that, I wrote it mostly just to clarify my own thinking.)

For the recent testing I did, I chose a power of two (65536) for the
table size, and rehashing rather than direct mapping.  Both of these
choices make it easier to evaluate how well hash functions perform.
Having a power-of-two table size reveals weaknesses in the hash
function more readily than taking the hash value modulo a prime;
using rehashing allows an easy computation of how many probes on
average are needed to do an insertion, which is an easier metric
to use for comparisons than occupancy rates (which typically need
some statistical calculations to appreciate the meaning of different
resulting occupancy rates).

Let me emphasize that I don't mean to advocate any of the different
hash table models over the others.  The key point is that a good
hash function should work with all of them.

On the input side, it's important to have a wide range of input
key sets, with different sorts of distributions.  The input key
sets might include:  linear addresses out of a single array, so
a+i*m with i in the range [0..k], for different values of m (to
be thorough all m in the range [1..64], all multiples of 2 from
64 to 128, all multiples of 4 from 128 to 256, all multiples of 8
from 256 to 512, and so forth up to all multiples of 64 up from
2048 to 4096);  results of malloc() using fixed sizes (all of the
m values from the last example);  results of malloc() using a
range of sizes, for several kinds of random distributions);
results of malloc() using linearly increasing sizes; and other
kinds of variations.  In my testing I used several fixed sizes to
malloc() all less than 60;  rand()%997 + 43 for malloc() sizes;
malloc( i/7 + 1 ) for each key value sub i;  malloc( i/17 + 1 )
for each key value sub i;  linear addressing in an array, with
various multipliers (ad hoc rather than systematic choices);
quadratic addressing in an array (which means array + i*i*m, for
index i and multiplier choice m) -- here again the choices for
m were ad hoc rather than system.

An interesting result under the "in array" choices is that they
sometimes would change from run to run, presumably as the array was
put at different starting positions.  Sometimes the differences were
surprisingly large.

Like I said before, a good hash function should produce average
or near-average values for all of these scenarios.  It might be
nice to see some above-average results under some workloads, but
they don't make up for well below-average results under other
workloads.  The situation is pretty much the opposite of what
happens with quicksort:  the worst case behavior of quicksort is
quadratic, but we know that in practice that almost never
happens, so quicksort is used because its typical or average
performance is better than that of other choices.  With hashing,
better-than-average performance is rare, and doesn't give much
benefit, but worse-than-average performance is more likely, and
can be disastrous.  Obviously there can be other factors that
might motivate other choices in some situations (eg, perfect
hashing), but for a general-use hash function the first design
criterion should be close-to-average behavior under all scenarios
tested.

Date Sujet#  Auteur
23 May 24 * Good hash for pointers111Malcolm McLean
23 May 24 +* Re: Good hash for pointers3Richard Harnden
24 May 24 i`* Re: Good hash for pointers2Keith Thompson
24 May 24 i `- Re: Good hash for pointers1Malcolm McLean
23 May 24 +- Re: Good hash for pointers1Kaz Kylheku
24 May 24 +* Re: Good hash for pointers16Tim Rentsch
24 May 24 i`* Re: Good hash for pointers15Malcolm McLean
24 May 24 i `* Re: Good hash for pointers14Tim Rentsch
24 May 24 i  `* Re: Good hash for pointers13Malcolm McLean
24 May 24 i   +* Re: Good hash for pointers11Tim Rentsch
24 May 24 i   i`* Re: Good hash for pointers10bart
24 May 24 i   i +* Re: Good hash for pointers3Malcolm McLean
24 May 24 i   i i`* Re: Good hash for pointers2Chris M. Thomasson
24 May 24 i   i i `- Re: Good hash for pointers1Chris M. Thomasson
24 May 24 i   i +* Re: Good hash for pointers3Tim Rentsch
24 May 24 i   i i`* Re: Good hash for pointers2Malcolm McLean
25 May 24 i   i i `- Re: Good hash for pointers1Tim Rentsch
24 May 24 i   i `* Re: Good hash for pointers3David Brown
24 May 24 i   i  `* Re: Good hash for pointers2Malcolm McLean
24 May 24 i   i   `- Re: Good hash for pointers1Michael S
24 May 24 i   `- Re: Good hash for pointers1David Brown
24 May 24 +* Re: Good hash for pointers3Chris M. Thomasson
24 May 24 i`* Re: Good hash for pointers2Chris M. Thomasson
24 May 24 i `- Re: Good hash for pointers1jak
24 May 24 +* Re: Good hash for pointers84Bonita Montero
24 May 24 i`* Re: Good hash for pointers83Malcolm McLean
25 May 24 i `* Re: Good hash for pointers82bart
25 May 24 i  `* Re: Good hash for pointers81Tim Rentsch
25 May 24 i   +* Re: Good hash for pointers4bart
25 May 24 i   i`* Re: Good hash for pointers3Tim Rentsch
25 May 24 i   i `* Re: Good hash for pointers2bart
26 May 24 i   i  `- Re: Good hash for pointers1Tim Rentsch
25 May 24 i   `* Re: Good hash for pointers76Bonita Montero
25 May 24 i    `* Re: Good hash for pointers75Tim Rentsch
25 May 24 i     +* Re: Good hash for pointers9Malcolm McLean
25 May 24 i     i+- Re: Good hash for pointers1Tim Rentsch
25 May 24 i     i+* Re: Good hash for pointers5Janis Papanagnou
26 May 24 i     ii`* Re: Good hash for pointers4Malcolm McLean
26 May 24 i     ii `* Re: Good hash for pointers3bart
26 May 24 i     ii  +- Re: Good hash for pointers1Richard Harnden
27 May 24 i     ii  `- Re: Good hash for pointers1Kaz Kylheku
26 May 24 i     i`* Re: Good hash for pointers2Bonita Montero
26 May 24 i     i `- Re: Good hash for pointers1Bonita Montero
26 May 24 i     `* Re: Good hash for pointers65Bonita Montero
26 May 24 i      `* Re: Good hash for pointers64Tim Rentsch
26 May 24 i       `* Re: Good hash for pointers63Bonita Montero
26 May 24 i        `* Re: Good hash for pointers62Tim Rentsch
26 May 24 i         +* Re: Good hash for pointers29Bonita Montero
26 May 24 i         i+- Re: Good hash for pointers1Bonita Montero
27 May 24 i         i+* Re: Good hash for pointers5Bonita Montero
28 May 24 i         ii`* Re: Good hash for pointers4Ben Bacarisse
30 May 24 i         ii `* Re: Good hash for pointers3Bonita Montero
30 May 24 i         ii  +- Re: Good hash for pointers1Bonita Montero
31 May 24 i         ii  `- Re: Good hash for pointers1Tim Rentsch
31 May 24 i         i`* Re: Good hash for pointers22Tim Rentsch
2 Jun 24 i         i `* Re: Good hash for pointers21Michael S
2 Jun 24 i         i  +* Re: Good hash for pointers2Chris M. Thomasson
3 Jun 24 i         i  i`- Re: Good hash for pointers1Chris M. Thomasson
3 Jun 24 i         i  +* Re: Good hash for pointers5Tim Rentsch
3 Jun 24 i         i  i`* Re: Good hash for pointers4Michael S
4 Jun 24 i         i  i `* Re: Good hash for pointers3Tim Rentsch
4 Jun 24 i         i  i  `* Re: Good hash for pointers2Michael S
5 Jun 24 i         i  i   `- Re: Good hash for pointers1Tim Rentsch
3 Jun 24 i         i  `* Re: Good hash for pointers13Bonita Montero
3 Jun 24 i         i   `* Re: Good hash for pointers12Michael S
3 Jun 24 i         i    `* Re: Good hash for pointers11Bonita Montero
3 Jun 24 i         i     +* Re: Good hash for pointers8bart
3 Jun 24 i         i     i+* Re: Good hash for pointers6Michael S
3 Jun 24 i         i     ii+* Re: Good hash for pointers4bart
3 Jun 24 i         i     iii+- Re: Good hash for pointers1Michael S
3 Jun 24 i         i     iii+- Re: Good hash for pointers1Michael S
4 Jun 24 i         i     iii`- Re: Good hash for pointers1Tim Rentsch
4 Jun 24 i         i     ii`- Re: Good hash for pointers1Tim Rentsch
3 Jun 24 i         i     i`- Re: Good hash for pointers1Bonita Montero
3 Jun 24 i         i     `* Re: Good hash for pointers2Michael S
3 Jun 24 i         i      `- Re: Good hash for pointers1Bonita Montero
4 Jun 24 i         `* Re: Good hash for pointers32Michael S
5 Jun 24 i          +* Re: Good hash for pointers4Bonita Montero
5 Jun 24 i          i`* Re: Good hash for pointers3Michael S
5 Jun 24 i          i `* Re: Good hash for pointers2Bonita Montero
5 Jun 24 i          i  `- Re: Good hash for pointers1Michael S
5 Jun 24 i          +* Re: Good hash for pointers17Tim Rentsch
5 Jun 24 i          i+* AES problem (was: Good hash for pointers)2Michael S
6 Jun 24 i          ii`- Re: AES problem (was: Good hash for pointers)1Tim Rentsch
5 Jun 24 i          i+* Re: Good hash for pointers11Michael S
6 Jun 24 i          ii`* Re: Good hash for pointers10Tim Rentsch
6 Jun 24 i          ii `* Re: Good hash for pointers9Michael S
17 Jun 24 i          ii  `* Re: Good hash for pointers8Tim Rentsch
17 Jun 24 i          ii   `* Re: Good hash for pointers7Michael S
18 Jun 24 i          ii    `* Re: Good hash for pointers6Tim Rentsch
19 Jun 24 i          ii     +* Re: Good hash for pointers2Keith Thompson
19 Jun 24 i          ii     i`- Re: Good hash for pointers1Tim Rentsch
19 Jun 24 i          ii     `* Re: Good hash for pointers3James Kuyper
19 Jun 24 i          ii      +- Re: Good hash for pointers1Keith Thompson
23 Jun 24 i          ii      `- Re: Good hash for pointers1Tim Rentsch
6 Jun 24 i          i`* Re: Good hash for pointers3Michael S
16 Jun 24 i          i `* Re: Good hash for pointers2Tim Rentsch
16 Jun 24 i          i  `- Re: Good hash for pointers1Chris M. Thomasson
7 Jun 24 i          `* Re: Good hash for pointers10Bonita Montero
9 Jun 24 i           `* Re: Good hash for pointers9Bonita Montero
9 Jun 24 i            +* Re: Good hash for pointers2Richard Harnden
10 Jun 24 i            `* Re: Good hash for pointers6Malcolm McLean
26 May 24 +* Re: Good hash for pointers2Bonita Montero
26 May 24 `- Re: Good hash for pointers1Bonita Montero

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal