1. When to reach for it
Use a Count-Min Sketch when you need approximate frequencies for a high-cardinality stream and exact counts are too expensive in memory. Typical fits: heavy-hitter detection, rate-limiting, anomaly scoring, and tail-latency budgeting on routers.
- Stream cardinality is huge (millions+ of distinct keys).
- You can tolerate over-counts but never under-counts.
- You need constant memory regardless of input size.
2. Sizing the sketch
Pick width w = ceil(e / epsilon) and depth d = ceil(ln(1 / delta)). With epsilon = 0.001 and delta = 0.01 you get w = 2719 and d = 5, a 13.6k-counter table that holds for any stream length.
Each row uses an independent hash. On update, increment all d cells. On query, return the minimum across rows: collisions can only inflate counts, never deflate.
3. Reference implementation
The Meridian gateway ships a Count-Min Sketch behind the abuse-detector for per-tenant token spend. Here is the minimal shape:
class CountMinSketch:
def __init__(self, w=2719, d=5):
self.w, self.d = w, d
self.table = [[0] * w for _ in range(d)]
self.seeds = [hash(("cms", i)) for i in range(d)]
def _idx(self, key, row):
return (hash((self.seeds[row], key))) % self.w
def add(self, key, count=1):
for r in range(self.d):
self.table[r][self._idx(key, r)] += count
def estimate(self, key):
return min(
self.table[r][self._idx(key, r)]
for r in range(self.d)
)