Portability | portable |
---|---|
Stability | unstable |
Maintainer | Bryan O'Sullivan <bos@serpentine.com> |
Safe Haskell | None |
Data.BloomFilter.Easy
Description
An easy-to-use Bloom filter interface.
Easy creation and querying
data Bloom a
An immutable Bloom filter, suitable for querying from pure code.
Arguments
:: Hashable a | |
=> Double | desired false positive rate (0 < e < 1) |
-> [a] | values to populate with |
-> Bloom a |
Create a Bloom filter with the given false positive rate and
members. The hash functions used are computed by the cheapHashes
function from the Hash
module.
Query an immutable Bloom filter for membership. If the value is
present, return True
. If the value is not present, there is
still some possibility that True
will be returned.
notElemB :: a -> Bloom a -> Bool
Query an immutable Bloom filter for non-membership. If the value
is present, return False
. If the value is not present, there
is still some possibility that True
will be returned.
Example: a spell checker
This example reads a dictionary file containing one word per line,
constructs a Bloom filter with a 1% false positive rate, and
spellchecks its standard input. Like the Unix spell
command, it
prints each word that it does not recognize.
import Data.BloomFilter.Easy (easyList, elemB) main = do filt <- (easyList
0.01 . words) `fmap` readFile "usrsharedictwords" let check word |elemB
word filt = "" | otherwise = word ++ "\n" interact (concat . map check . lines)
Useful defaults for creation
Arguments
:: Int | expected maximum capacity |
-> Double | desired false positive rate (0 < e < 1) |
-> Either String (Int, Int) |
Suggest a good combination of filter size and number of hash functions for a Bloom filter, based on its expected maximum capacity and a desired false positive rate.
The false positive rate is the rate at which queries against the
filter should return True
when an element is not actually
present. It should be a fraction between 0 and 1, so a 1% false
positive rate is represented by 0.01.