Redshift Primer
A condensed guide for connecting Meridian to Amazon Redshift clusters, tuning warehouse queries against high-cardinality event streams, and shipping cost-aware analytics without falling into the classic distribution-key traps.
1. Wire the connection
Meridian talks to Redshift over standard Postgres wire protocol on port 5439. Provision a least-privilege user, whitelist the Meridian egress range, and store the credential as an encrypted secret rather than an environment variable for production workloads.
-- 1. Provision the role CREATE USER meridian_reader WITH PASSWORD '<rotate-me>'; GRANT USAGE ON SCHEMA analytics TO meridian_reader; GRANT SELECT ON ALL TABLES IN SCHEMA analytics TO meridian_reader; -- 2. Register the cluster meridian sources add redshift \ --host warehouse.abc.us-east-1.redshift.amazonaws.com \ --port 5439 --db analytics --user meridian_reader
2. Pick the right distribution style
Redshift sorts and distributes data at write time. For event tables where Meridian will join on user_id, use a KEY distribution on that column so co-located joins skip the network shuffle. Reference dimensions under a few million rows belong on ALL distribution. Reserve EVEN only for truly unjoined fact tables that exceed the broadcast threshold.
- KEY: large fact tables, joined on a stable column
- ALL: small dimensions read by every node
- EVEN: load-balanced staging tables with no join key
- AUTO: let Redshift decide for early-stage schemas
3. Tune for cost and latency
Meridian streams aggregations on a schedule. Enable result caching, pin heavy dashboards to a dedicated WLM queue, and prefer materialized views for nested CTEs that the planner cannot fold. The snippet below shows a sort-key + materialized-view combo that drops a 90-day rollup from 14 seconds to under 400 milliseconds on a 2-node ra3.xlplus cluster.
CREATE MATERIALIZED VIEW analytics.daily_active
DISTKEY (user_id)
SORTKEY (event_day)
AS
SELECT
date_trunc('day', occurred_at) AS event_day,
user_id,
count(*) AS events
FROM analytics.event_stream
WHERE occurred_at > current_date - 90
GROUP BY 1, 2;
REFRESH MATERIALIZED VIEW analytics.daily_active;