Back to docs
Recipe

Recipe: Invoice line-item parser

Extract structured line items from raw invoice text using Meridian structured output. Handles multi-line descriptions, tax breakdowns, and currency normalization.

Overview

This recipe walks through building a reliable invoice parser that takes unstructured text from PDFs, emails, or OCR output and returns clean JSON line items ready for accounting systems.

Schema design

Define a strict JSON schema for each line item: description, quantity, unit price, total, and optional tax code. Meridian enforces types and required fields so partial parses never reach your downstream pipeline.

Handling edge cases

  • Multi-line descriptions that span page breaks
  • Currencies with varied symbol placement ($, € suffix, R$ prefix)
  • Tax-inclusive vs tax-exclusive line totals
  • Negative quantities for credit memos and adjustments

Batch processing

Chain multiple invoices through Meridian async jobs. Each parse runs independently with its own retry budget. Collect results into a unified ledger export.

Full code sample and prompt template available in the Meridian cookbook.