Re: Good hash for pointers

Liste des GroupesRevenir à cl c  
Sujet : Re: Good hash for pointers
De : tr.17687 (at) *nospam* z991.linuxsc.com (Tim Rentsch)
Groupes : comp.lang.c
Date : 03. Jun 2024, 01:02:05
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <86le3nne36.fsf@linuxsc.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Michael S <already5chosen@yahoo.com> writes:

On Thu, 30 May 2024 19:27:48 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
Bonita Montero <Bonita.Montero@gmail.com> writes:
>
Am 26.05.2024 um 19:20 schrieb Tim Rentsch:
>
I say the output quality is poor because I have run tests that
show the poor output quality.
>
If you chose a prime whose double is beyond 64 bit there's an
equal distribution among the 64 bit modulos.
>
I've done that with a prime of my own choosing and also with
18446744073709551557, the value you suggested.
>
Show me your code.
>
Oh get real.  It's not my job to make up for your
laziness.
>
So, what were your conclusions?
Ignoring the speed of computation, would something like
cryptographic hash scaled to bucket size be a best hash for
this type of application?  Or some sort of less thorough
grinding of the bits is better?

Factors that matter:
  will hashing be done using rehashing or aggregates (lists
    or trees, etc) grouped under the first probe location?
    (or both?)

  can keys be compared for ordering or just for equality?

  how expensive is it to test for key equality?
    (a) when the keys are equal
    (b) when the keys are not equal

  32-bit hash values suffice for (a) re-hashing schemes with
    tables up to perhaps 2**30 entries, or (b) schemes that
    use direct hashing and bucket aggregation with tables up
    to 2**32 slots.  Anything bigger should use 64-bit hash
    values.

  hash functions should always produce a full-size hash value
    (either 32 bits or 64 bits).  Fitting the hash value to
    the size of the table is the responsibility of the hash
    table management code, not the hash function.

The goal:
  An ideal hash function has the property that changing any
  single bit of the input key has a 50% chance of flipping
  any output bit, with no correlation between output bit
  m and output bit n (m != n) for changing a given input
  bit, and no correlation between changes to output bit n
  for changing input bit i and changing input bit j (i != j)

Evaluating hash functions:
  One way to evaluate a hash function is to measure changes
  in output bits as a function of changing various inputs
  bits, either exhaustively or statistically, looking for an
  ideal distribution as described above.  Usually doing this
  isn't very helpful, because most hash functions don't come
  close to the ideal, and it's hard to interpret results that
  are not close to the ideal, even though those hashes may be
  perfectly fine in practice

  So basically we are left with comparing a candidate hash
  function, performance-wise, to other hash functions that
  are going to have good performance results.  Three obvious
  choices for those:  a slow but high quality hash such as a
  cryptographic hash;  using a good random number generator
  to produce keys where the hash value is equal to the key
  value;  comparing against theoretical ideals.  Looking at
  each in turn:

  High-quality-but-slow-hash:  it isn't hard to make a hash
  function that approximates an ideal hash, as long as we
  are willing to have it run like a turtle.  Or simply use
  one of the many known cryptographic hashes and fold the
  output down to a reasonably size hash values (most of
  these cryptographic hash functions produce large hash
  values, 128 bits and up, which isn't a great fit for
  ordinary hash table use).  In my recent testing I simply
  rolled my own high quality hash, which ran about 10 times
  slower than the slowest of the other hashes I looked at.

  Using a good RNG to produce keys the same as the hash
  value:  I expect this method to be self-explanatory.

  Comparing against theoretical ideals:  it isn't hard to
  do a calculation that computes a statistical average for
  the two kinds of hashing methods.  More specifically,

    (a) for re-hashing, compute the average number of
        probes taken for a hash table with k out of n
        slots already filled, and use that to compute
        the average number of probes need to fill the
        hash table up to, say, 85%

    (b) for direct hashing, compute the average number
        of slots occupied after inserting some number
        of values representing a fixed fraction of the
        table size.  For example, if we have a table
        with 100,000 slots, if we insert 100,000 values
        then on the average about 63% of the slots
        should be occupied (very close to 1 - 1/e).

  Use one or more of the above comparison values (they
  should all be very close) to compare against the
  performance of the candidate hash function for each
  of the possible workloads.  My workload set included:

    (a) results of malloc() with fixed sized blocks,
        with a size going from 1 to some modest value
        (50 or 100)

    (b) results of malloc() with randomly chosen sizes
        based on some distribution

    (c) sequential addresses in an array, varying the
        element size between 1 and some modest value
        (maybe 50)

    (d) results of malloc() with a deterministic set
        of sizes.  I tried varying ones of these, such
        as size(i) == i/17 + 1 (filling a table of
        65636 entries varying percentages full).

What we would like to see:
  for every workload tested, results with 10 or 20%
  of the comparison metric used

What we don't want to see (this is important):
  any result far away from the comparison metric on any
  workload.  Importantly this includes results that are
  significantly _better_ than the comparison metric.
  The reason for this is that doing a lot better on one
  test inevitably means doing worse on a different test.
  The same is true of random number generators:  the
  best ones look "averagely random" all the time, never
  either "sub random" or "super random".

Of course we always want high quality, but the standards
are higher for (a) direct hashing, and/or (b) cases where
comparing keys is expensive.  Note that if comparing keys
is expensive we might choose to store the full hash value
with each table entry, to make "trivial reject" on key
comparison be faster.

Probably much of the above was obvious.  I make no apologies for
that.  Also it may seem disjointed or unorganized.  If so then
sorry about that chief, it's a big topic and there's a lot of
ground to cover, and I tried to hit the most important
highlights.

Date Sujet#  Auteur
23 May 24 * Good hash for pointers111Malcolm McLean
23 May 24 +* Re: Good hash for pointers3Richard Harnden
24 May 24 i`* Re: Good hash for pointers2Keith Thompson
24 May 24 i `- Re: Good hash for pointers1Malcolm McLean
23 May 24 +- Re: Good hash for pointers1Kaz Kylheku
24 May 24 +* Re: Good hash for pointers16Tim Rentsch
24 May 24 i`* Re: Good hash for pointers15Malcolm McLean
24 May 24 i `* Re: Good hash for pointers14Tim Rentsch
24 May 24 i  `* Re: Good hash for pointers13Malcolm McLean
24 May 24 i   +* Re: Good hash for pointers11Tim Rentsch
24 May 24 i   i`* Re: Good hash for pointers10bart
24 May 24 i   i +* Re: Good hash for pointers3Malcolm McLean
24 May 24 i   i i`* Re: Good hash for pointers2Chris M. Thomasson
24 May 24 i   i i `- Re: Good hash for pointers1Chris M. Thomasson
24 May 24 i   i +* Re: Good hash for pointers3Tim Rentsch
24 May 24 i   i i`* Re: Good hash for pointers2Malcolm McLean
25 May 24 i   i i `- Re: Good hash for pointers1Tim Rentsch
24 May 24 i   i `* Re: Good hash for pointers3David Brown
24 May 24 i   i  `* Re: Good hash for pointers2Malcolm McLean
24 May 24 i   i   `- Re: Good hash for pointers1Michael S
24 May 24 i   `- Re: Good hash for pointers1David Brown
24 May 24 +* Re: Good hash for pointers3Chris M. Thomasson
24 May 24 i`* Re: Good hash for pointers2Chris M. Thomasson
24 May 24 i `- Re: Good hash for pointers1jak
24 May 24 +* Re: Good hash for pointers84Bonita Montero
24 May 24 i`* Re: Good hash for pointers83Malcolm McLean
25 May 24 i `* Re: Good hash for pointers82bart
25 May 24 i  `* Re: Good hash for pointers81Tim Rentsch
25 May 24 i   +* Re: Good hash for pointers4bart
25 May 24 i   i`* Re: Good hash for pointers3Tim Rentsch
25 May 24 i   i `* Re: Good hash for pointers2bart
26 May 24 i   i  `- Re: Good hash for pointers1Tim Rentsch
25 May 24 i   `* Re: Good hash for pointers76Bonita Montero
25 May 24 i    `* Re: Good hash for pointers75Tim Rentsch
25 May 24 i     +* Re: Good hash for pointers9Malcolm McLean
25 May 24 i     i+- Re: Good hash for pointers1Tim Rentsch
25 May 24 i     i+* Re: Good hash for pointers5Janis Papanagnou
26 May 24 i     ii`* Re: Good hash for pointers4Malcolm McLean
26 May 24 i     ii `* Re: Good hash for pointers3bart
26 May 24 i     ii  +- Re: Good hash for pointers1Richard Harnden
27 May 24 i     ii  `- Re: Good hash for pointers1Kaz Kylheku
26 May 24 i     i`* Re: Good hash for pointers2Bonita Montero
26 May 24 i     i `- Re: Good hash for pointers1Bonita Montero
26 May 24 i     `* Re: Good hash for pointers65Bonita Montero
26 May 24 i      `* Re: Good hash for pointers64Tim Rentsch
26 May 24 i       `* Re: Good hash for pointers63Bonita Montero
26 May 24 i        `* Re: Good hash for pointers62Tim Rentsch
26 May 24 i         +* Re: Good hash for pointers29Bonita Montero
26 May 24 i         i+- Re: Good hash for pointers1Bonita Montero
27 May 24 i         i+* Re: Good hash for pointers5Bonita Montero
28 May 24 i         ii`* Re: Good hash for pointers4Ben Bacarisse
30 May 24 i         ii `* Re: Good hash for pointers3Bonita Montero
30 May 24 i         ii  +- Re: Good hash for pointers1Bonita Montero
31 May 24 i         ii  `- Re: Good hash for pointers1Tim Rentsch
31 May 24 i         i`* Re: Good hash for pointers22Tim Rentsch
2 Jun 24 i         i `* Re: Good hash for pointers21Michael S
2 Jun 24 i         i  +* Re: Good hash for pointers2Chris M. Thomasson
3 Jun 24 i         i  i`- Re: Good hash for pointers1Chris M. Thomasson
3 Jun 24 i         i  +* Re: Good hash for pointers5Tim Rentsch
3 Jun 24 i         i  i`* Re: Good hash for pointers4Michael S
4 Jun 24 i         i  i `* Re: Good hash for pointers3Tim Rentsch
4 Jun 24 i         i  i  `* Re: Good hash for pointers2Michael S
5 Jun 24 i         i  i   `- Re: Good hash for pointers1Tim Rentsch
3 Jun 24 i         i  `* Re: Good hash for pointers13Bonita Montero
3 Jun 24 i         i   `* Re: Good hash for pointers12Michael S
3 Jun 24 i         i    `* Re: Good hash for pointers11Bonita Montero
3 Jun 24 i         i     +* Re: Good hash for pointers8bart
3 Jun 24 i         i     i+* Re: Good hash for pointers6Michael S
3 Jun 24 i         i     ii+* Re: Good hash for pointers4bart
3 Jun 24 i         i     iii+- Re: Good hash for pointers1Michael S
3 Jun 24 i         i     iii+- Re: Good hash for pointers1Michael S
4 Jun 24 i         i     iii`- Re: Good hash for pointers1Tim Rentsch
4 Jun 24 i         i     ii`- Re: Good hash for pointers1Tim Rentsch
3 Jun 24 i         i     i`- Re: Good hash for pointers1Bonita Montero
3 Jun 24 i         i     `* Re: Good hash for pointers2Michael S
3 Jun 24 i         i      `- Re: Good hash for pointers1Bonita Montero
4 Jun 24 i         `* Re: Good hash for pointers32Michael S
5 Jun 24 i          +* Re: Good hash for pointers4Bonita Montero
5 Jun 24 i          i`* Re: Good hash for pointers3Michael S
5 Jun 24 i          i `* Re: Good hash for pointers2Bonita Montero
5 Jun 24 i          i  `- Re: Good hash for pointers1Michael S
5 Jun 24 i          +* Re: Good hash for pointers17Tim Rentsch
5 Jun 24 i          i+* AES problem (was: Good hash for pointers)2Michael S
6 Jun 24 i          ii`- Re: AES problem (was: Good hash for pointers)1Tim Rentsch
5 Jun 24 i          i+* Re: Good hash for pointers11Michael S
6 Jun 24 i          ii`* Re: Good hash for pointers10Tim Rentsch
6 Jun 24 i          ii `* Re: Good hash for pointers9Michael S
17 Jun 24 i          ii  `* Re: Good hash for pointers8Tim Rentsch
17 Jun 24 i          ii   `* Re: Good hash for pointers7Michael S
18 Jun 24 i          ii    `* Re: Good hash for pointers6Tim Rentsch
19 Jun 24 i          ii     +* Re: Good hash for pointers2Keith Thompson
19 Jun 24 i          ii     i`- Re: Good hash for pointers1Tim Rentsch
19 Jun 24 i          ii     `* Re: Good hash for pointers3James Kuyper
19 Jun 24 i          ii      +- Re: Good hash for pointers1Keith Thompson
23 Jun 24 i          ii      `- Re: Good hash for pointers1Tim Rentsch
6 Jun 24 i          i`* Re: Good hash for pointers3Michael S
16 Jun 24 i          i `* Re: Good hash for pointers2Tim Rentsch
16 Jun 24 i          i  `- Re: Good hash for pointers1Chris M. Thomasson
7 Jun 24 i          `* Re: Good hash for pointers10Bonita Montero
9 Jun 24 i           `* Re: Good hash for pointers9Bonita Montero
9 Jun 24 i            +* Re: Good hash for pointers2Richard Harnden
10 Jun 24 i            `* Re: Good hash for pointers6Malcolm McLean
26 May 24 +* Re: Good hash for pointers2Bonita Montero
26 May 24 `- Re: Good hash for pointers1Bonita Montero

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal