ClickHouse Primer
A practical introduction to ClickHouse for Meridian operators. Learn how columnar storage, MergeTree engines, and materialized views combine to give you sub-second analytics over billions of inference events without the operational pain of traditional OLAP stacks.
1. Why ClickHouse for Meridian
Meridian writes one row per inference call: tokens in, tokens out, latency, model alias, customer id. ClickHouse stores these columns contiguously, so a query that touches three columns reads three columns from disk, not the entire row. The result is order-of-magnitude faster scans than Postgres on the same hardware, and storage that compresses to under 10% of raw JSON.
- Append-only ingest at 100k rows/sec on a single node
- Sub-second percentiles over a month of traffic
- ZSTD compression cuts disk by 12x vs JSON logs
2. The MergeTree table that powers /v1/usage
Every Meridian deployment ships with a base table partitioned by day and ordered by customer + timestamp. This is the schema the public usage endpoint queries directly:
CREATE TABLE meridian.events ( ts DateTime64(3, 'UTC'), customer_id String, model LowCardinality(String), tokens_in UInt32, tokens_out UInt32, latency_ms UInt32, cost_usd Decimal(10, 6), status LowCardinality(String) ) ENGINE = MergeTree PARTITION BY toYYYYMMDD(ts) ORDER BY (customer_id, ts) TTL ts + INTERVAL 90 DAY SETTINGS index_granularity = 8192;
3. Materialized views for the admin console
The admin console never scans the raw events table at read time. A materialized view rolls each event into a per-customer, per-hour bucket on insert, so the cost cockpit query stays under 50 milliseconds even when the base table holds two billion rows. When you add a new metric, create a fresh view rather than altering the existing one.
The pattern: raw MergeTree for forensics, AggregatingMergeTree views for dashboards. Keep the raw retention short, the rollups long.