Sujet : Re: Good hash for pointers
De : Bonita.Montero (at) *nospam* gmail.com (Bonita Montero)
Groupes : comp.lang.cDate : 07. Jun 2024, 19:53:49
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v3vkvp$268b2$1@raubtier-asyl.eternal-september.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12
User-Agent : Mozilla Thunderbird
I tested the AES hash function with a C++20 program. The hash-function
is also re-written in C++-style:
#include <iostream>
#include <utility>
#include <vector>
#include <cstdint>
#include <algorithm>
#include <execution>
#include <thread>
#include <span>
#include <intrin.h>
using namespace std;
uint64_t MichaelsHash( uint64_t key );
int main()
{
static_assert(sizeof(size_t) == 8,"");
// number of buckets
#if defined(NDEBUG)
constexpr ptrdiff_t N_BUCKETS = 1ull << 32;
#else
constexpr ptrdiff_t N_BUCKETS = 1 << 16;
#endif
// don't zero-initialize update-array
union ndi_sz { size_t s; ndi_sz() {} };
vector<ndi_sz> rawUpdates( N_BUCKETS );
// native size_t span to update-array
span<size_t> updates( &rawUpdates[0].s, N_BUCKETS );
int hc = thread::hardware_concurrency();
// thread partitition end to update-array
ptrdiff_t end = N_BUCKETS;
// partitition index to update array
ptrdiff_t p = hc;
vector<jthread> threads;
threads.reserve( hc );
do
{
// begin to update array
ptrdiff_t begin = (ptrdiff_t)((double)--p / hc * N_BUCKETS);
threads.emplace_back( [&]( size_t begin, size_t end )
{
for( ; begin != end; ++begin )
updates[begin] = MichaelsHash( begin ) % N_BUCKETS;
}, begin, end );
// next end to update array is the begin before
end = begin;
} while( p > 0 );
// wait for all threads to finish
threads.resize( 0 );
// sort update array parallel
sort( execution::parallel_policy(), updates.begin(), updates.end() );
vector<uint8_t> buckets( N_BUCKETS );
// update buckets according to the update-list
for( size_t u : updates )
if( !++buckets[u] )
return EXIT_FAILURE;
// release memory of the update list
rawUpdates.clear();
rawUpdates.shrink_to_fit();
// sum up loads
vector<size_t> loads;
for( size_t bucketLoad : buckets )
{
if( loads.size() <= bucketLoad )
loads.resize( bucketLoad + 1 );
++loads[bucketLoad];
}
// print loads percentage
for( size_t ld = 0; ptrdiff_t nLoad : loads )
cout << ld++ << ": " << (double)nLoad / (ptrdiff_t)buckets.size() << endl;
}
uint64_t MichaelsHash( uint64_t key )
{
__m128i xkey = _mm_set_epi64x( key, 42 );
using bar_t = pair<uint64_t, uint64_t>;
static bar_t const bars[8] =
{
{ 0xBB09BBCC90B24BF2, 0x825C622FF2792A01 },
{ 0x94F0535CB06D4060, 0x939C756246DBFD1D },
{ 0x5B835E01A7E14CA1, 0xAC2BDAFC023CDD06 },
{ 0xE0B6A4735B774AEC, 0x9CAFB43E7DDE494C },
};
for( bar_t const &bar : bars )
{
__m128i rkey;
rkey.m128i_u64[0] = bar.first;
rkey.m128i_u64[1] = bar.second;
xkey = _mm_aesenc_si128( xkey, rkey );
}
return xkey.m128i_u64[0];
}
The hash function is tested with 2 ^ 32 buckets. The problem with that
is that the bucket-counters are updated with total random memory access,
which is extremely slow. So I first prepare an update-list, wich is a
vector of size_t's with the indices of the buckets being updated later.
The memory access for this update-list is updated lineary, which is very
fast. The update-list is written with the available number of threads.
This list is sorted after that and quicksorting also has linear memory
acces. To improve that further the quicksort is done with parallel
execution. The update-list takes 2 ^ 32 size_t's, which is 32GiB. The
sorted update list ressults in linear updates of the buckets.
This sounds rather complex, but I first tried it with random updates
to the buckets on a single core and on my 7950X Zen4-CPU I stopped
after 20min.
The resulting load-depths are according to the coupon collector'
problem (1 / (e * N!)):
C:\Users\Boni\Documents\Source\MichaelAesHash>timep x64\Release\MichaelAesHash.exe
0: 36.7875%
1: 36.7884%
2: 18.3945%
3: 6.13053%
4: 1.53304%
5: 0.306619%
6: 0.0510489%
7: 0.00729787%
8: 0.000908948%
9: 0.000102189%
10: 1.07335e-05%
11: 8.14907e-07%
12: 4.65661e-08%
real 53.783s
user 11m:39.984s
sys 6.141s
cylces 3.446.315.603.052