A minimal perfect hash function (MPHF) maps a set S of n keys to the first n integers without collisions. Perfect hash functions have applications in databases, bioinformatics, and as a building block of various space-efficient data structures.
FiPS (Finperprint Perfect Hashing through Sorting) is a very simple and fast implementation of perfect hashing through fingerprinting.
The idea of perfect hashing through fingerprinting is to hash each input key to a fingerprint. An array stores a bit for each possible fingerprint indicating whether keys collided. Colliding keys are handled in another layer of the same data structure. There are many implementations of the approach, for example FMPH (Rust).
The construction of perfect hashing through fingerprinting is usually implemented as a linear scan, producing a fault for every key. Instead, FiPS is based on sorting the keys, which is more cache friendly and faster. Also, it interleaves its select data structure with the payload data, which enables very fast queries. Lastly, the FiPS implementation is very simple, with just about 200 lines of code.
Clone this repository (with submodules) and add the following to your CMakeLists.txt.
add_subdirectory(path/to/FiPS)
target_link_libraries(YourTarget PRIVATE FiPS::fips)
You can construct a FiPS perfect hash function as follows.
std::vector<std::string> keys = {"abc", "def", "123", "456"};
fips::FiPS<> hashFunc(keys, /* gamma = */ 2.0f);
std::cout << hashFunc("abc") << std::endl;This code is licensed under the GPLv3.