HyperLogLog primer
Count billions of distinct items in kilobytes of memory. HyperLogLog trades exact answers for a stable error bound around 0.81%, making it the default tool for unique-visitor counts, distinct-query tracking, and stream deduplication at Meridian scale.
1. The intuition
Hash every incoming element to a uniform bitstring. Count the leading zeros of each hash. The maximum run of leading zeros seen across N hashes is roughly log2(N) — long runs are rare, so observing a long run is evidence many distinct items have been processed.
HyperLogLog refines this by splitting hashes across m registers and taking a harmonic mean. The result is an estimator with relative error 1.04 / sqrt(m).
2. Register layout
Meridian uses m = 2^14 = 16384 registers of 6 bits each — 12 KB per sketch, 1.62% error, counts comfortably to 10^9 distinct.
// register update hash = murmur3_64(item) bucket = hash >> (64 - 14) // top 14 bits tail = hash << 14 // remaining 50 bits rho = clz(tail) + 1 // leading zeros + 1 M[bucket] = max(M[bucket], rho)
3. Merging and querying
Two sketches with the same m merge by taking the per-register maximum — this is what makes HLL associative and parallelizable. Shard your stream, count locally, fold globally.
Cardinality is the bias-corrected harmonic mean of 2^M[i]. Small cardinalities fall back to linear counting; large cardinalities use the raw HLL estimate. Meridian handles the switch automatically.