Synthetic data generator
Generate realistic fake datasets for testing, demos, and load profiling without touching production data. Meridian ships a built-in recipe that produces CSV, JSON, or Parquet output with configurable row counts, column types, and statistical distributions.
Quick start
meridian recipe run synthetic-data \
--rows 50000 \
--columns name,email,age,score \
--format parquet \
--output ./demo.parquetColumn types
- name — first + last from census-weighted distribution
- email — derived from name with realistic domain pools
- age — gaussian (μ=38, σ=14), clamped 18–90
- score — uniform float [0,1] or custom range
Distribution tuning
Pass --dist with a JSON string to override defaults. Supports normal, uniform, lognormal, and categorical with user-supplied weights.
Tip: Combine with the benchmark runner recipe to stress-test pipelines with known-good synthetic data before running against real workloads.