RECIPE / CARDINALITY

HyperLogLog primer

Count billions of distinct items in kilobytes of memory. HyperLogLog trades exact answers for a stable error bound around 0.81%, making it the default tool for unique-visitor counts, distinct-query tracking, and stream deduplication at Meridian scale.

1. The intuition

Hash every incoming element to a uniform bitstring. Count the leading zeros of each hash. The maximum run of leading zeros seen across N hashes is roughly log2(N) — long runs are rare, so observing a long run is evidence many distinct items have been processed.

HyperLogLog refines this by splitting hashes across m registers and taking a harmonic mean. The result is an estimator with relative error 1.04 / sqrt(m).

2. Register layout

Meridian uses m = 2^14 = 16384 registers of 6 bits each — 12 KB per sketch, 1.62% error, counts comfortably to 10^9 distinct.

// register update
hash    = murmur3_64(item)
bucket  = hash >> (64 - 14)          // top 14 bits
tail    = hash << 14                 // remaining 50 bits
rho     = clz(tail) + 1              // leading zeros + 1
M[bucket] = max(M[bucket], rho)

3. Merging and querying

Two sketches with the same m merge by taking the per-register maximum — this is what makes HLL associative and parallelizable. Shard your stream, count locally, fold globally.

Cardinality is the bias-corrected harmonic mean of 2^M[i]. Small cardinalities fall back to linear counting; large cardinalities use the raw HLL estimate. Meridian handles the switch automatically.