Recipe: NL → SQL
Convert natural language questions into SQL queries using few-shot prompting and a pre-defined database schema injected into the system prompt.
Schema
CREATE TABLE customers ( id INT PRIMARY KEY, name VARCHAR(100), email VARCHAR(150), created_at TIMESTAMP ); CREATE TABLE orders ( id INT PRIMARY KEY, customer_id INT REFERENCES customers(id), total DECIMAL(10,2), status VARCHAR(20), placed_at TIMESTAMP ); CREATE TABLE products ( id INT PRIMARY KEY, name VARCHAR(200), price DECIMAL(10,2), category VARCHAR(50) ); CREATE TABLE order_items ( order_id INT REFERENCES orders(id), product_id INT REFERENCES products(id), quantity INT, PRIMARY KEY (order_id, product_id) );
Few-shot examples
“Show all customers who joined in 2024”
SELECT name, email FROM customers WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';“Total revenue per product category”
SELECT p.category, SUM(oi.quantity * p.price) AS revenue FROM order_items oi JOIN products p ON oi.product_id = p.id GROUP BY p.category ORDER BY revenue DESC;“Customers with more than 3 orders”
SELECT c.name, COUNT(o.id) AS order_count FROM customers c JOIN orders o ON c.id = o.customer_id GROUP BY c.id, c.name HAVING COUNT(o.id) > 3;System prompt
You are a SQL generator. Given a natural language question and the schema below, output ONLY a valid PostgreSQL SELECT query. Do not explain. Do not wrap in markdown. Schema: <schema injected here> Examples: User: Show all customers who joined in 2024 Assistant: SELECT name, email FROM customers WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'; User: Total revenue per product category Assistant: SELECT p.category, SUM(oi.quantity * p.price) AS revenue FROM order_items oi JOIN products p ON oi.product_id = p.id GROUP BY p.category ORDER BY revenue DESC;