Back to Docs
Recipe
Log pattern miner
Extract recurring patterns from unstructured log streams using frequency analysis and token clustering. Ship a lightweight miner that surfaces anomalies without shipping raw logs off-device.
Overview
The miner ingests line-delimited log files or ETW trace output, tokenizes each line on whitespace and punctuation boundaries, and builds a frequency table of token n-grams. Lines that deviate from high-frequency templates are flagged as anomalies and surfaced via a local dashboard or structured JSON export.
Pipeline
- Ingest — Tail a file or subscribe to an ETW provider. Buffer lines in a ring buffer capped at 64 MiB.
- Tokenize — Split on whitespace, strip punctuation, lowercase. Replace numeric tokens with a
<NUM>placeholder. - Cluster — Group lines by token-length and leading-token signature. Build a template for each cluster using the most frequent token at each position.
- Score — Compare each incoming line against its cluster template. Lines with edit distance above a configurable threshold are flagged.
- Export — Write flagged lines to a local SQLite database with timestamp, template ID, and deviation score. Expose via a read-only HTTP endpoint on localhost.
Configuration
| Key | Default | Description |
|---|---|---|
| ngram_size | 3 | Token n-gram width for clustering |
| deviation_threshold | 0.35 | Edit-distance ratio to flag a line |
| ring_buffer_mb | 64 | Max in-memory line buffer size |
| listen_port | 9127 | Localhost dashboard port |
Output schema
{
"timestamp": "2026-05-26T14:32:01.123Z",
"template_id": "tpl_0x4a2f",
"raw_line": "ERR conn reset peer 192.168.1.42:8443",
"deviation_score": 0.72,
"cluster_size": 1843
}