RECIPE

io_uring Primer

io_uring is the Linux asynchronous I/O interface that replaces the legacy AIO and epoll plumbing with a pair of shared ring buffers. This primer walks through the submission and completion model, shows a minimal echo loop, and lists the pitfalls that bite high-throughput services first.

1.The two rings

io_uring exposes a Submission Queue (SQ) and a Completion Queue (CQ) mapped into both the kernel and user address space. The application writes Submission Queue Entries (SQEs) and the kernel writes Completion Queue Entries (CQEs). Syscalls collapse to a single ring update, which means tight loops never leave userspace.

Setup happens through io_uring_setup(2), which returns a fd you mmap to obtain the rings. Most code uses liburing so you never touch the raw syscall.

2.A minimal accept-read-write loop

The following sketch handles one connection. In production you would chain SQEs with IOSQE_IO_LINK and reuse fixed buffers, but the shape stays identical.

struct io_uring ring;
io_uring_queue_init(256, &ring, 0);

struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_accept(sqe, listen_fd, NULL, NULL, 0);
io_uring_sqe_set_data(sqe, (void *)ACCEPT_OP);
io_uring_submit(&ring);

struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe);
int client_fd = cqe->res;
io_uring_cqe_seen(&ring, cqe);

3.Pitfalls and tuning

  • Buffers must outlive the SQE. The kernel reads them after the syscall returns.
  • SQPOLL eliminates syscalls entirely but pins a kernel thread per ring — budget for it.
  • Use registered file descriptors and fixed buffers to skip per-op refcount work.
  • Kernel versions matter: multi-shot accept lands in 5.19, zero-copy send in 6.0.
  • Always check cqe->res for negative errno values before trusting the result.