Recipe

Metric naming + label taxonomy

A consistent schema for naming metrics and structuring labels so dashboards stay readable, queries stay fast, and on-call engineers stop guessing.

Metric name structure

Every metric follows the pattern <domain>_<noun>_<unit>. Domains are short prefixes like http, db, or queue. Nouns describe what is measured (requests, latency, errors). Units are always plural and unabbreviated: seconds not sec.

Label cardinality budget

Labels are expensive. Cap total label combinations per metric at 10,000. Never put user IDs, request IDs, or unbounded strings in labels. Use structured log attributes for high-cardinality dimensions and link back via trace ID.

Standard label set

status — success | error | degraded
error_type — timeout | ratelimit | internal | upstream
method — HTTP verb or RPC name, lowercased
endpoint — normalized path with parameter placeholders

Histogram bucket policy

Use explicit buckets tuned to your SLO, not default Prometheus buckets. For latency SLOs at 100ms, buckets should cluster around 50ms–200ms with a long tail for outliers. Document bucket choices in a comment above the metric definition.

Naming anti-patterns

No dots in metric names — use underscores
No units baked into the name suffix when a unit label exists
No _total suffix on counters — Prometheus appends it
No service name in the metric — that is a label

Next recipe: Alert routing by severity