Recipe

Recipe: Kubernetes HPA design

A production-ready Horizontal Pod Autoscaler configuration tuned for latency-sensitive workloads running on Meridian infrastructure.

Target workload

Stateless API servers behind a Layer-7 ingress. Average request latency p99 must stay under 200ms during traffic spikes.

Metrics source

Prometheus adapter exposing http_requests_per_second per pod. Fallback to CPU utilization when custom metrics are unavailable.

Scaling thresholds

Target: 80 requests/second per pod
Scale-up window: 60s stabilization
Scale-down window: 300s stabilization
Min replicas: 3, Max replicas: 20

Manifest

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-gateway
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "80"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300

Caveats

HPA reacts to metric spikes, not queuing depth. Pair with a pod-overhead budget so cluster autoscaler has headroom. Test scale-from-zero cold starts before enabling in production.

← Back to docs