Protobuf Encoding Deep Dive
Protocol Buffers compress structured data into a compact, schema-aware wire format. This recipe walks through varint encoding, tag composition, and zig-zag tricks so you can read raw bytes by hand and debug wire-level mismatches without a decoder.
1. Varints — the load-bearing primitive
Every integer in protobuf is a varint: a base-128 little-endian stream where the high bit of each byte signals continuation. Small numbers cost one byte; only the rare large value pays the full ten. The encoder strips the MSB, packs seven bits per byte, then sets the MSB on every non-terminal byte to keep parsers in sync.
Result: a field with value 1 costs two bytes total — one for the tag, one for the payload.
2. Tag composition — field number meets wire type
Each field begins with a tag varint formed by (field_number << 3) | wire_type. The bottom three bits encode the wire type (0 varint, 1 fixed64, 2 length-delimited, 5 fixed32); the rest carries the field number. Unknown fields are skipped using the wire type alone, which is why forward compatibility survives schema drift.
3. Zig-zag — making signed ints small again
Naive two's-complement encoding of -1 produces ten varint bytes because every sign bit propagates. The sint32 and sint64 types apply zig-zag — interleaving positive and negative values — so small magnitudes stay one byte regardless of sign.
message Trade {
int32 id = 1; // tag 0x08, varint payload
string symbol = 2; // tag 0x12, length-delimited
sint32 delta = 3; // tag 0x18, zig-zag varint
}
// Trade{id:1, symbol:"NVDA", delta:-1} on the wire:
// 08 01 -> field 1 varint = 1
// 12 04 4E 56 44 41 -> field 2 len=4 "NVDA"
// 18 01 -> field 3 zig-zag(-1) = 1