AWS Step Functions state-machine design
A repeatable pattern for designing, testing, and deploying Step Functions state machines with error handling, retry policies, and human-in-the-loop approval steps.
Ingredients
- AWS SAM CLI or CDK project scaffolded
- ASL definition written in YAML or JSON
- Lambda functions for each Task state
- SNS topic for approval notifications
- CloudWatch Logs group for execution history
Steps
- Model the workflow — map every decision point, parallel branch, and terminal state on a whiteboard before writing ASL.
- Define retry policies — attach
Retryblocks to each Task with exponential backoff and max attempts. - Add Catch clauses — route known error types to compensating workflows or a dead-letter queue.
- Wire the approval gate — use a Task with
.waitForTaskTokenand an SNS callback. - Deploy and smoke-test — invoke with a test payload, inspect the execution event history in the console.
Guardrails
- Keep state machines under 25,000 state transitions
- Use nested workflows for reusable sub-processes
- Enable logging to CloudWatch on every deployment
- Tag all resources with
meridian:recipe