Recipe

Log pattern miner

Extract recurring patterns from unstructured log streams using frequency analysis and token clustering. Ship a lightweight miner that surfaces anomalies without shipping raw logs off-device.

Overview

The miner ingests line-delimited log files or ETW trace output, tokenizes each line on whitespace and punctuation boundaries, and builds a frequency table of token n-grams. Lines that deviate from high-frequency templates are flagged as anomalies and surfaced via a local dashboard or structured JSON export.

Pipeline

Ingest — Tail a file or subscribe to an ETW provider. Buffer lines in a ring buffer capped at 64 MiB.
Tokenize — Split on whitespace, strip punctuation, lowercase. Replace numeric tokens with a <NUM> placeholder.
Cluster — Group lines by token-length and leading-token signature. Build a template for each cluster using the most frequent token at each position.
Score — Compare each incoming line against its cluster template. Lines with edit distance above a configurable threshold are flagged.
Export — Write flagged lines to a local SQLite database with timestamp, template ID, and deviation score. Expose via a read-only HTTP endpoint on localhost.

Configuration

Key	Default	Description
ngram_size	3	Token n-gram width for clustering
deviation_threshold	0.35	Edit-distance ratio to flag a line
ring_buffer_mb	64	Max in-memory line buffer size
listen_port	9127	Localhost dashboard port

Output schema

{
  "timestamp": "2026-05-26T14:32:01.123Z",
  "template_id": "tpl_0x4a2f",
  "raw_line": "ERR conn reset peer 192.168.1.42:8443",
  "deviation_score": 0.72,
  "cluster_size": 1843
}

← All recipes Implementation guide →