← Back to docs
Recipe

Metric naming + label taxonomy

A consistent schema for naming metrics and structuring labels so dashboards stay readable, queries stay fast, and on-call engineers stop guessing.

Metric name structure

Every metric follows the pattern <domain>_<noun>_<unit>. Domains are short prefixes like http, db, or queue. Nouns describe what is measured (requests, latency, errors). Units are always plural and unabbreviated: seconds not sec.

Label cardinality budget

Labels are expensive. Cap total label combinations per metric at 10,000. Never put user IDs, request IDs, or unbounded strings in labels. Use structured log attributes for high-cardinality dimensions and link back via trace ID.

Standard label set

  • status success | error | degraded
  • error_type timeout | ratelimit | internal | upstream
  • method — HTTP verb or RPC name, lowercased
  • endpoint — normalized path with parameter placeholders

Histogram bucket policy

Use explicit buckets tuned to your SLO, not default Prometheus buckets. For latency SLOs at 100ms, buckets should cluster around 50ms–200ms with a long tail for outliers. Document bucket choices in a comment above the metric definition.

Naming anti-patterns

  • No dots in metric names — use underscores
  • No units baked into the name suffix when a unit label exists
  • No _total suffix on counters — Prometheus appends it
  • No service name in the metric — that is a label