Back to docsRecipe

Synthetic data generator

Generate realistic fake datasets for testing, demos, and load profiling without touching production data. Meridian ships a built-in recipe that produces CSV, JSON, or Parquet output with configurable row counts, column types, and statistical distributions.

Quick start

meridian recipe run synthetic-data \
  --rows 50000 \
  --columns name,email,age,score \
  --format parquet \
  --output ./demo.parquet

Column types

name — first + last from census-weighted distribution
email — derived from name with realistic domain pools
age — gaussian (μ=38, σ=14), clamped 18–90
score — uniform float [0,1] or custom range

Distribution tuning

Pass --dist with a JSON string to override defaults. Supports normal, uniform, lognormal, and categorical with user-supplied weights.

Tip: Combine with the benchmark runner recipe to stress-test pipelines with known-good synthetic data before running against real workloads.