Recipe

Redshift Primer

A condensed guide for connecting Meridian to Amazon Redshift clusters, tuning warehouse queries against high-cardinality event streams, and shipping cost-aware analytics without falling into the classic distribution-key traps.

1. Wire the connection

Meridian talks to Redshift over standard Postgres wire protocol on port 5439. Provision a least-privilege user, whitelist the Meridian egress range, and store the credential as an encrypted secret rather than an environment variable for production workloads.

-- 1. Provision the role
CREATE USER meridian_reader WITH PASSWORD '<rotate-me>';
GRANT USAGE ON SCHEMA analytics TO meridian_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics
  TO meridian_reader;

-- 2. Register the cluster
meridian sources add redshift \
  --host warehouse.abc.us-east-1.redshift.amazonaws.com \
  --port 5439 --db analytics --user meridian_reader

2. Pick the right distribution style

Redshift sorts and distributes data at write time. For event tables where Meridian will join on user_id, use a KEY distribution on that column so co-located joins skip the network shuffle. Reference dimensions under a few million rows belong on ALL distribution. Reserve EVEN only for truly unjoined fact tables that exceed the broadcast threshold.

  • KEY: large fact tables, joined on a stable column
  • ALL: small dimensions read by every node
  • EVEN: load-balanced staging tables with no join key
  • AUTO: let Redshift decide for early-stage schemas

3. Tune for cost and latency

Meridian streams aggregations on a schedule. Enable result caching, pin heavy dashboards to a dedicated WLM queue, and prefer materialized views for nested CTEs that the planner cannot fold. The snippet below shows a sort-key + materialized-view combo that drops a 90-day rollup from 14 seconds to under 400 milliseconds on a 2-node ra3.xlplus cluster.

CREATE MATERIALIZED VIEW analytics.daily_active
DISTKEY (user_id)
SORTKEY (event_day)
AS
SELECT
  date_trunc('day', occurred_at) AS event_day,
  user_id,
  count(*) AS events
FROM analytics.event_stream
WHERE occurred_at > current_date - 90
GROUP BY 1, 2;

REFRESH MATERIALIZED VIEW analytics.daily_active;