Re: Good hash for pointers

Liste des GroupesRevenir à l c 
Sujet : Re: Good hash for pointers
De : tr.17687 (at) *nospam* z991.linuxsc.com (Tim Rentsch)
Groupes : comp.lang.c
Date : 16. Jun 2024, 13:38:58
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <86cyohktgt.fsf@linuxsc.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Michael S <already5chosen@yahoo.com> writes:

On Wed, 05 Jun 2024 08:58:46 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
If we're trying to write a general hash function, it should work
well across a wide range of input key sets, and also for different
kinds of hash table management.  Let's look at the output side
first.  There are two independent axes for hash tables:  table
size, and direct mapping versus using rehashing.  For table size,
the two prominent choices are some prime in the appropriate range,
or else a power of two.  Clearly any size could be chosen, but
primes have the nice property that they divide up the space of hash
values very evenly, whereas powers of two allow a cheaper way of
reducing a hash value to the range of table indices (anding versus a
modulo operation).  A good hash function should work well with both.
The choice of direct mapping versus rehashing depends on expected
occupancy rates (which I am used to calling "load factor").  For a
load factor of 75 to 80 percent or less, generally rehashing is
better.  Direct mapping has the advantage that it can handle load
factors above 100%, at a cost of dealing with whatever subsidiary
data structure (usually linked lists) is used when two or more keys
map to the same index in the hash table.  (I expect you know all
that, I wrote it mostly just to clarify my own thinking.)
>
I never had interest in implementation of hash methods, typically
just took what language or library provides and used it.
All my knowledge of internals are 50 y.o.  news (TAoCP, and no I
did not read it 50 years ago, but I did read it first time
slightly less than 40 years ago and not sure if I re-read this
particular topic since then).
So, I am not sure that I understood everything above.
In particular, w.r.t. rehashing, IIRC, Knuth came to conclusion
that simple circular increment of the index is superior to more
elaborate methods.  I don't know if conclusion was changed in the
following years, not even sure that I recollect it correctly.

Hashing is discussed in volume 3 of TAOCP.  I first read volume 3
just after it came out;  I reviewed the section on hashing again
after reading your message.  Your remembrance doesn't match what
is said in that section.  Moreover some of the conclusions Knuth
wrote in 1973 are certainly out of date now.  General rehashing,
also called open addressing, is overall a net win, especially for
hash tables that have an above-average load factor.  Also there
are variations such as cuckoo hash that depend on using open
addressing.

Despite (or due to?) my relative ignorance of the topic, I'd dare
to disagree with couple of your points:
1. I don't think that hash function should be evaluated in
isolation from the rest of algorithm.  IMHO, they should be
co-developed.  There is nothing wrong with hash function that
works well with one particular "back end" and poorly with all
others as long as limitations are clearly stated.

There are several reasons why a co-development approach usually
isn't a good way to go.  Malcolm started this thread by asking for
a good hash function for pointers (to be used in a portable ANSI C
setting).  We don't know, nor do we have any control over, how
Malcolm might use the function (and that goes double for any random
reader of comp.lang.c who might see some posted hash function and
decide to use it).  Or we might want to use some sort of generic
table manager that takes a pointer-to-function to do the hashing,
but no way to affect how the hash values are used.  Another reason
has to do with total code size.  If there are M data types to be
hashed, and N applications that use hash values, coupling the code
that does the hashing with the code that makes use of the hash
values means there will be M*N parts to program, whereas separating
the two aspects means there will be only M+N parts to program.  The
M+N path almost certainly means a big reduction in the overall
amount of effort needed.

Of course there are some specialized applications, such as perfect
hashing, where the hashing method needs to be tailored to some sort
of external considerations.  In most cases though a general hashing
function is a better choice.

As for how evaluation should be done, it's important to evaluate
hash functions in isolation for the same reasons it's important to
evaluate random number generators in isolation:  the results need
to be repeatable, and they need to give a quantitative measure of
how "mixed up" the outputs are for similar inputs.  Knuth talks
about random numbers in volume 2 of TAOCP, spending more than 100
pages on the subject, but he gives an essential precept before the
end of page 5: "Random numbers should not be generated with a
method chosen at random.  Some theory should be used."  The same is
true of hash functions, and a repeatable quantitative evaluation of
the function by itself is a key element of that process.

2. I don't like "modulo" method of reduction of hash value to
range.  I don't like it for any chosen size of the bucket, not
just for power of two, but even for prime.  Of course, for power
of two size my dislike to it is deeper.

I understand your reaction, but this question is a separate topic.
Part of the reason for designing a high-quality hash function is so
that it doesn't matter which bits are used:  high order bits, low
order bits, some of each, every other bit, or every third or fourth
bit, they should all give good results (where "good" means nearby
inputs give what appear to be uncorrelated outputs).  If we design
our hash function to do that then it doesn't matter whether a hash
table does a mod or uses a different method.  Obviously that is
better than a hash function that works okay with some hash tables
but works poorly with others.

Date Sujet#  Auteur
23 May 24 * Good hash for pointers111Malcolm McLean
23 May 24 +* Re: Good hash for pointers3Richard Harnden
24 May 24 i`* Re: Good hash for pointers2Keith Thompson
24 May 24 i `- Re: Good hash for pointers1Malcolm McLean
23 May 24 +- Re: Good hash for pointers1Kaz Kylheku
24 May 24 +* Re: Good hash for pointers16Tim Rentsch
24 May 24 i`* Re: Good hash for pointers15Malcolm McLean
24 May 24 i `* Re: Good hash for pointers14Tim Rentsch
24 May 24 i  `* Re: Good hash for pointers13Malcolm McLean
24 May 24 i   +* Re: Good hash for pointers11Tim Rentsch
24 May 24 i   i`* Re: Good hash for pointers10bart
24 May 24 i   i +* Re: Good hash for pointers3Malcolm McLean
24 May 24 i   i i`* Re: Good hash for pointers2Chris M. Thomasson
24 May 24 i   i i `- Re: Good hash for pointers1Chris M. Thomasson
24 May 24 i   i +* Re: Good hash for pointers3Tim Rentsch
24 May 24 i   i i`* Re: Good hash for pointers2Malcolm McLean
25 May 24 i   i i `- Re: Good hash for pointers1Tim Rentsch
24 May 24 i   i `* Re: Good hash for pointers3David Brown
24 May 24 i   i  `* Re: Good hash for pointers2Malcolm McLean
24 May 24 i   i   `- Re: Good hash for pointers1Michael S
24 May 24 i   `- Re: Good hash for pointers1David Brown
24 May 24 +* Re: Good hash for pointers3Chris M. Thomasson
24 May 24 i`* Re: Good hash for pointers2Chris M. Thomasson
24 May 24 i `- Re: Good hash for pointers1jak
24 May 24 +* Re: Good hash for pointers84Bonita Montero
24 May 24 i`* Re: Good hash for pointers83Malcolm McLean
25 May 24 i `* Re: Good hash for pointers82bart
25 May 24 i  `* Re: Good hash for pointers81Tim Rentsch
25 May 24 i   +* Re: Good hash for pointers4bart
25 May 24 i   i`* Re: Good hash for pointers3Tim Rentsch
25 May 24 i   i `* Re: Good hash for pointers2bart
26 May 24 i   i  `- Re: Good hash for pointers1Tim Rentsch
25 May 24 i   `* Re: Good hash for pointers76Bonita Montero
25 May 24 i    `* Re: Good hash for pointers75Tim Rentsch
25 May 24 i     +* Re: Good hash for pointers9Malcolm McLean
25 May 24 i     i+- Re: Good hash for pointers1Tim Rentsch
25 May 24 i     i+* Re: Good hash for pointers5Janis Papanagnou
26 May 24 i     ii`* Re: Good hash for pointers4Malcolm McLean
26 May 24 i     ii `* Re: Good hash for pointers3bart
26 May 24 i     ii  +- Re: Good hash for pointers1Richard Harnden
27 May 24 i     ii  `- Re: Good hash for pointers1Kaz Kylheku
26 May 24 i     i`* Re: Good hash for pointers2Bonita Montero
26 May 24 i     i `- Re: Good hash for pointers1Bonita Montero
26 May 24 i     `* Re: Good hash for pointers65Bonita Montero
26 May 24 i      `* Re: Good hash for pointers64Tim Rentsch
26 May 24 i       `* Re: Good hash for pointers63Bonita Montero
26 May 24 i        `* Re: Good hash for pointers62Tim Rentsch
26 May 24 i         +* Re: Good hash for pointers29Bonita Montero
26 May 24 i         i+- Re: Good hash for pointers1Bonita Montero
27 May 24 i         i+* Re: Good hash for pointers5Bonita Montero
28 May 24 i         ii`* Re: Good hash for pointers4Ben Bacarisse
30 May 24 i         ii `* Re: Good hash for pointers3Bonita Montero
30 May 24 i         ii  +- Re: Good hash for pointers1Bonita Montero
31 May 24 i         ii  `- Re: Good hash for pointers1Tim Rentsch
31 May 24 i         i`* Re: Good hash for pointers22Tim Rentsch
2 Jun 24 i         i `* Re: Good hash for pointers21Michael S
2 Jun 24 i         i  +* Re: Good hash for pointers2Chris M. Thomasson
3 Jun 24 i         i  i`- Re: Good hash for pointers1Chris M. Thomasson
3 Jun 24 i         i  +* Re: Good hash for pointers5Tim Rentsch
3 Jun 24 i         i  i`* Re: Good hash for pointers4Michael S
4 Jun 24 i         i  i `* Re: Good hash for pointers3Tim Rentsch
4 Jun 24 i         i  i  `* Re: Good hash for pointers2Michael S
5 Jun 24 i         i  i   `- Re: Good hash for pointers1Tim Rentsch
3 Jun 24 i         i  `* Re: Good hash for pointers13Bonita Montero
3 Jun 24 i         i   `* Re: Good hash for pointers12Michael S
3 Jun 24 i         i    `* Re: Good hash for pointers11Bonita Montero
3 Jun 24 i         i     +* Re: Good hash for pointers8bart
3 Jun 24 i         i     i+* Re: Good hash for pointers6Michael S
3 Jun 24 i         i     ii+* Re: Good hash for pointers4bart
3 Jun 24 i         i     iii+- Re: Good hash for pointers1Michael S
3 Jun 24 i         i     iii+- Re: Good hash for pointers1Michael S
4 Jun 24 i         i     iii`- Re: Good hash for pointers1Tim Rentsch
4 Jun 24 i         i     ii`- Re: Good hash for pointers1Tim Rentsch
3 Jun 24 i         i     i`- Re: Good hash for pointers1Bonita Montero
3 Jun 24 i         i     `* Re: Good hash for pointers2Michael S
3 Jun 24 i         i      `- Re: Good hash for pointers1Bonita Montero
4 Jun 24 i         `* Re: Good hash for pointers32Michael S
5 Jun 24 i          +* Re: Good hash for pointers4Bonita Montero
5 Jun 24 i          i`* Re: Good hash for pointers3Michael S
5 Jun 24 i          i `* Re: Good hash for pointers2Bonita Montero
5 Jun 24 i          i  `- Re: Good hash for pointers1Michael S
5 Jun 24 i          +* Re: Good hash for pointers17Tim Rentsch
5 Jun 24 i          i+* AES problem (was: Good hash for pointers)2Michael S
6 Jun 24 i          ii`- Re: AES problem (was: Good hash for pointers)1Tim Rentsch
5 Jun 24 i          i+* Re: Good hash for pointers11Michael S
6 Jun 24 i          ii`* Re: Good hash for pointers10Tim Rentsch
6 Jun 24 i          ii `* Re: Good hash for pointers9Michael S
17 Jun 24 i          ii  `* Re: Good hash for pointers8Tim Rentsch
17 Jun 24 i          ii   `* Re: Good hash for pointers7Michael S
18 Jun 24 i          ii    `* Re: Good hash for pointers6Tim Rentsch
19 Jun 24 i          ii     +* Re: Good hash for pointers2Keith Thompson
19 Jun 24 i          ii     i`- Re: Good hash for pointers1Tim Rentsch
19 Jun 24 i          ii     `* Re: Good hash for pointers3James Kuyper
19 Jun 24 i          ii      +- Re: Good hash for pointers1Keith Thompson
23 Jun 24 i          ii      `- Re: Good hash for pointers1Tim Rentsch
6 Jun 24 i          i`* Re: Good hash for pointers3Michael S
16 Jun 24 i          i `* Re: Good hash for pointers2Tim Rentsch
16 Jun 24 i          i  `- Re: Good hash for pointers1Chris M. Thomasson
7 Jun 24 i          `* Re: Good hash for pointers10Bonita Montero
9 Jun 24 i           `* Re: Good hash for pointers9Bonita Montero
9 Jun 24 i            +* Re: Good hash for pointers2Richard Harnden
10 Jun 24 i            `* Re: Good hash for pointers6Malcolm McLean
26 May 24 +* Re: Good hash for pointers2Bonita Montero
26 May 24 `- Re: Good hash for pointers1Bonita Montero

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal