1.Why a queue at all
HTTP handlers must answer within seconds. Anything slower (model inference, batch embeddings, S3 uploads) belongs on a queue so the request can return immediately and the worker fleet can scale independently. BullMQ persists every job in Redis, so a crashed worker resumes mid-stream instead of losing customer data.