Recipe: Invoice line-item parser
Extract structured line items from raw invoice text using Meridian structured output. Handles multi-line descriptions, tax breakdowns, and currency normalization.
Overview
This recipe walks through building a reliable invoice parser that takes unstructured text from PDFs, emails, or OCR output and returns clean JSON line items ready for accounting systems.
Schema design
Define a strict JSON schema for each line item: description, quantity, unit price, total, and optional tax code. Meridian enforces types and required fields so partial parses never reach your downstream pipeline.
Handling edge cases
- Multi-line descriptions that span page breaks
- Currencies with varied symbol placement ($, € suffix, R$ prefix)
- Tax-inclusive vs tax-exclusive line totals
- Negative quantities for credit memos and adjustments
Batch processing
Chain multiple invoices through Meridian async jobs. Each parse runs independently with its own retry budget. Collect results into a unified ledger export.
Full code sample and prompt template available in the Meridian cookbook.