Recipe: NL → SQL

Convert natural language questions into SQL queries using few-shot prompting and a pre-defined database schema injected into the system prompt.

Schema

CREATE TABLE customers (
  id INT PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(150),
  created_at TIMESTAMP
);

CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT REFERENCES customers(id),
  total DECIMAL(10,2),
  status VARCHAR(20),
  placed_at TIMESTAMP
);

CREATE TABLE products (
  id INT PRIMARY KEY,
  name VARCHAR(200),
  price DECIMAL(10,2),
  category VARCHAR(50)
);

CREATE TABLE order_items (
  order_id INT REFERENCES orders(id),
  product_id INT REFERENCES products(id),
  quantity INT,
  PRIMARY KEY (order_id, product_id)
);

Few-shot examples

Show all customers who joined in 2024

SELECT name, email FROM customers WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

Total revenue per product category

SELECT p.category, SUM(oi.quantity * p.price) AS revenue FROM order_items oi JOIN products p ON oi.product_id = p.id GROUP BY p.category ORDER BY revenue DESC;

Customers with more than 3 orders

SELECT c.name, COUNT(o.id) AS order_count FROM customers c JOIN orders o ON c.id = o.customer_id GROUP BY c.id, c.name HAVING COUNT(o.id) > 3;

System prompt

You are a SQL generator. Given a natural language question and the schema below, output ONLY a valid PostgreSQL SELECT query. Do not explain. Do not wrap in markdown.

Schema:
<schema injected here>

Examples:
User: Show all customers who joined in 2024
Assistant: SELECT name, email FROM customers WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

User: Total revenue per product category
Assistant: SELECT p.category, SUM(oi.quantity * p.price) AS revenue FROM order_items oi JOIN products p ON oi.product_id = p.id GROUP BY p.category ORDER BY revenue DESC;

Try it