Back to Docs
Recipe

Log pattern miner

Extract recurring patterns from unstructured log streams using frequency analysis and token clustering. Ship a lightweight miner that surfaces anomalies without shipping raw logs off-device.

Overview

The miner ingests line-delimited log files or ETW trace output, tokenizes each line on whitespace and punctuation boundaries, and builds a frequency table of token n-grams. Lines that deviate from high-frequency templates are flagged as anomalies and surfaced via a local dashboard or structured JSON export.

Pipeline

  1. IngestTail a file or subscribe to an ETW provider. Buffer lines in a ring buffer capped at 64 MiB.
  2. TokenizeSplit on whitespace, strip punctuation, lowercase. Replace numeric tokens with a <NUM> placeholder.
  3. ClusterGroup lines by token-length and leading-token signature. Build a template for each cluster using the most frequent token at each position.
  4. ScoreCompare each incoming line against its cluster template. Lines with edit distance above a configurable threshold are flagged.
  5. ExportWrite flagged lines to a local SQLite database with timestamp, template ID, and deviation score. Expose via a read-only HTTP endpoint on localhost.

Configuration

KeyDefaultDescription
ngram_size3Token n-gram width for clustering
deviation_threshold0.35Edit-distance ratio to flag a line
ring_buffer_mb64Max in-memory line buffer size
listen_port9127Localhost dashboard port

Output schema

{
  "timestamp": "2026-05-26T14:32:01.123Z",
  "template_id": "tpl_0x4a2f",
  "raw_line": "ERR conn reset peer 192.168.1.42:8443",
  "deviation_score": 0.72,
  "cluster_size": 1843
}