Sujet : Re: Good hash for pointers
De : tr.17687 (at) *nospam* z991.linuxsc.com (Tim Rentsch)
Groupes : comp.lang.cDate : 17. Jun 2024, 08:56:40
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <86zfrkj93b.fsf@linuxsc.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
User-Agent : Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Michael S <
already5chosen@yahoo.com> writes:
On Wed, 05 Jun 2024 21:40:27 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
Michael S <already5chosen@yahoo.com> writes:
>
On Wed, 05 Jun 2024 08:58:46 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
I did get your own hash function out and put it into my little
test rig. There may be summary comments below.
>
<snip>
>
As you probably already paid attention, I like bottom lines.
What is a bottom line here?
>
The bottom line is both the multiply-by-a-large-prime hashes
should be avoided.
>
I am not convinced.
Let me amend my earlier statement. A hash function that simply
multiplies its input by an integer constant should be avoided
if the goal is to produce a hash function of decent quaility
rather than one of mediocre quality.
Of course, it is no good for Malcolm, because it can't be coded
in what Malcolm erroneously call "ANSI C".
I don't know why you say that. C was an ANSI standard before it
was an ISO standard. Or is it that you think that the language
Malcolm is intent on using does not conform to C90/C89/ANSI C?
Did you you encounter cases in which almost-bonita's-but-not-quite
hash function performed badly?
>
Yes. A clear example is using directly mapped hash table that
has a power of two elements, with inputs all from malloc( 48 )
calls. All keys map to one of only 1024 entries, in a hash
table that has 65536 slots.
>
I don't see it.
$ ./a 0x10000 0xC000 48
ref: 49152 keys, 65536 slots, 34590 occupied 11295 collisions.
Worst collision: 7
uut: 49152 keys, 65536 slots, 35791 occupied 11872 collisions.
Worst collision: 3
You have run a poor test. The point of quality evaluation
testing is to look for potential problems, not to disguise
them.
It sounds like in your tests you don't use 'linear scaling to
table size' for translation of hash value to table index.
IMHO, you should.
Like I said in another response, the question of how hash tables
should make use of hash values is a separate topic. The point
of the tests I ran was to evaluate the quality of hash functions,
not to look at how hash tables should be coded.
Here are some statistics I gathered from a more comprehensive
set of tests for several hash functions. The tests look at
aggregates of measured values for all output bits (considered
eight bits at a time) as a function of all input bits (considered
sixteen bits at a time. The first line in each pair summarizes
the low values in each set, and the second line summarizes the
high values in each set, in each case showing the range, average,
and standard deviation. The first set, hash 0, uses the aes
code you posted. I think it's easy to see the quality get
worse going from top to bottom.
hash 0
186 .. 201 avg: 195.39 dev: 3.71
312 .. 339 avg: 322.31 dev: 5.85
hash 1
186 .. 203 avg: 195.48 dev: 4.01
310 .. 344 avg: 322.45 dev: 6.31
hash 2
160 .. 202 avg: 193.56 dev: 7.70
311 .. 348 avg: 322.22 dev: 7.91
hash 3
143 .. 202 avg: 185.52 dev: 10.15
311 .. 387 avg: 329.52 dev: 14.10
hash 4
0 .. 212 avg: 9.94 dev: 45.16
270 .. 65536 avg: 44546.56 dev: 28809.47