Building A Reliable Udp Messaging Protocol: Packet Loss Recovery With Fec And Selective Acks

2024-05-22 · Leonardo Benicio

Summary

A comprehensive technical exploration of building a reliable udp messaging protocol: packet loss recovery with fec and selective acks, covering key concepts, practical implementations, and real-world applications.

Contents

From Firehose to Fountain: Mastering the Art of Reliable Data Transfer Over UDP

The world’s most popular internet protocol for reliability is a liar. TCP, the stalwart of web browsing, email, and file downloads, promises you one thing above all else: perfect, in-order delivery. And it delivers on that promise, but it does so with a single-minded, often crippling, obsession. It will wait. It will buffer. It will re-transmit lost packets from the beginning of a stream before it even looks at the packets that arrived successfully moments later. For the vast majority of human-centric applications, this is a feature. You don’t want a web page to render a picture of a cat if the text below it is missing, or for your email to arrive with its signature arriving before the body. TCP’s strict ordering and retransmission logic are the bedrock of textual reliability.

But what happens when you’re not sending text? What happens when you’re building a real-time multiplayer game, a live video streaming service, or a high-frequency trading engine? In these worlds, the “perfect” promise of TCP becomes a liability. A packet arriving 200 milliseconds late, perfectly in order, is worse than a packet that never arrived at all. The game character has already been shot. The stock price has already moved. The video frame is already frozen. The cost of reliability is latency, and for a growing class of applications, latency is the only currency that matters.

This is where User Datagram Protocol (UDP) steps onto the stage. UDP is the honest, minimalist messenger. It says, “I will try to get this data out the door as fast as possible. If it gets lost on the way, I have no idea, and I don’t care.” This lack of overhead—no connection handshake, no congestion control, no delivery guarantees—gives us the raw, unadulterated speed we crave. It’s a firehose of data. The problem, of course, is that a firehose isn’t a precision instrument. It can’t distinguish between a vital control message and a redundant update. It can’t reorder the torrent if packets arrive scrambled. And most critically, it cannot guarantee that any single drop of that firehose will ever reach its destination.

So, how do we turn a firehose into a fountain—a controlled, reliable stream that preserves the low latency of UDP while adding just enough reliability for our application’s needs? The answer lies in building our own transport layer on top of UDP. In this post, we will dissect the anatomy of UDP, explore real-world use cases where TCP fails and UDP triumphs, and then dive deep into the art of crafting custom reliability mechanisms: sequence numbers, acknowledgments, retransmission timers, selective NACKs, and more. We’ll walk through practical code examples in Python and C++ to illustrate how you can implement a simple reliable protocol over UDP, and then we’ll examine how production-grade solutions like WebRTC and QUIC tackle the same challenges. By the end, you’ll understand not just why you might choose UDP, but how to build the reliability you need on top of its simple, fast foundation.

Chapter 1: The Honest Protocol – UDP Under the Microscope

Before we begin engineering solutions, we must fully understand the problem space. UDP is defined in RFC 768, a document so short it can be printed on a single page. Its header is a mere 8 bytes, divided into four two-byte fields:

Source Port
Destination Port
Length (of header and data)
Checksum (optional in IPv4, mandatory in IPv6)

That’s it. No sequence numbers, no acknowledgment numbers, no flags for SYN, ACK, FIN, or RST. No window scaling, no congestion window, no SACK blocks. UDP is stateless from the protocol’s perspective. Each datagram is an independent island. The kernel does not maintain a connection state for a UDP socket. There is no handshake to initiate communication. You simply bind a socket to a port and start reading and writing.

1.1 The Cost of Abstraction

The beauty of UDP is its minimalism, but that minimalism places a heavy burden on the application developer. Let’s enumerate the responsibilities that TCP handles automatically but UDP leaves to you:

Framing: TCP is a stream protocol. The kernel delivers bytes in order, and the application can read arbitrarily sized chunks. UDP is message-oriented. Each sendto() call produces exactly one datagram (up to ~64 KB, but realistically limited by the MTU—typically 1500 bytes for Ethernet). If your application needs to send a larger message, you must fragment and reassemble it yourself.
Delivery Guarantee: TCP ensures data reaches the destination (barring catastrophic failure) via retransmissions. UDP makes no such promise. Every datagram you send may never arrive.
Ordering: TCP reassembles out-of-order packets before handing them to the application. UDP delivers datagrams in the order they arrive from the network, which may differ from the order they were sent.
Congestion Control: TCP implements a sophisticated algorithm (AIMD, slow start, fast retransmit, etc.) to avoid overwhelming the network. UDP does nothing. A misbehaving UDP application can (and often does) saturate a link, causing packet loss for itself and others.
Flow Control: TCP uses advertised window sizes to prevent a fast sender from overwhelming a slow receiver. UDP has no such mechanism. The receiver must be able to consume datagrams as fast as they arrive, or its receive buffer fills and packets are dropped.
Error Detection: TCP’s checksum covers both header and payload. UDP’s checksum is optional in IPv4 and mandatory in IPv6. If you disable it (a bad idea), you lose error detection.
Connection Management: TCP requires a three-way handshake (SYN, SYN-ACK, ACK) to establish a connection and a four-way handshake to tear it down. UDP has no connection state—the same socket can communicate with multiple peers, and the kernel doesn’t track which client sent what.

1.2 Why Bother? The Latency Argument

Given this laundry list of things you have to re-implement, why would anyone choose UDP over TCP? The answer is latency, and more specifically, the head-of-line blocking problem and the handshake overhead problem.

Head-of-Line Blocking (HoLB): TCP delivers a byte stream in strict order. If packet 1 is lost, all data following packet 1 must wait in the kernel’s buffer until packet 1 is retransmitted and arrives successfully. In a real-time game, the state update for frame N might contain vital information (player position), while the state update for frame N+1 might be a redundant intermediate position. Under TCP, if the packet for frame N is lost, frame N+1 is hidden from the application, even if it arrived. The player sees a frozen screen. Under UDP, both packets arrive; the application can choose to render the latest known state (frame N+1) and skip the missing frame N. The game appears smooth.

Handshake Overhead: TCP connections require a full round-trip (RTT) before you can send data. For a worst-case scenario where the client-server RTT is 200ms, the user must wait 200ms for the connection to be established before any application data flows. For short-lived connections (e.g., DNS queries, HTTP requests for small resources), this overhead is significant. QUIC (which runs over UDP) reduces this to 0 RTT on subsequent connections using pre-shared keys, but TCP cannot.

Tail Latency and Retransmission Ambiguity: TCP’s retransmission timer (RTO) is based on estimated RTT and typically starts at a multiple of the SRTT (smoothed RTT). The minimum RTO in many TCP stacks is 200ms. When a packet is lost, you wait at least 200ms before retransmitting. Under UDP, you can implement your own retransmission scheme with finer granularity—e.g., 10ms or even 1ms for high-frequency trading.

Thus, UDP is the foundation for applications where the most recent data is more important than all the data. The classic examples:

Real-time audio/video: Lost a video frame? Show the previous frame. Lost part of an audio stream? The codec can mask the glitch. Under TCP, a retransmission would cause the video to pause (stutter) while waiting for the lost packet.
Online gaming: Player positions, action commands, and world state updates are inherently ephemeral. If the current position is lost, you just overwrite it with the next one. A TCP retransmission that delivers an old position would actually break the game.
High-frequency trading (HFT): Every nanosecond matters. HFT firms custom-build UDP-based protocols (often called “market data feeds”) that achieve microsecond-level latency. They accept occasional packet loss in exchange for not having a slow retransmit timer.
VPN/tunneling protocols: WireGuard, OpenVPN (in some modes), and many custom tunnels use UDP to encapsulate traffic because they need to maintain low latency and avoid TCP-over-TCP meltdown (when TCP is tunnelled inside TCP, the retransmission timers interact destructively).
DNS: The Domain Name System originally used UDP for its speed and simplicity, relying on application-layer retries for reliability.

Chapter 2: The Minimal Reliable Layer – Sequence Numbers and ACKs

Now we enter the engineering core. We want to create a protocol over UDP that provides some reliability—for example, “at most X seconds of retransmission” or “all critical commands must be delivered in order, but state updates are best-effort.” Let’s start with the simplest possible reliable mechanism: a stop-and-wait protocol.

2.1 Stop-and-Wait Over UDP

Imagine we have a client sending a series of messages to a server, and we require every message to be delivered and processed in order. The naive UDP approach would just sendto() each message and hope. But we want 100% delivery (or at least, we want to know if a message was lost). Stop-and-wait works:

Sender sends a message with a sequence number (SN).
Sender starts a timer.
Receiver receives the message, processes it, and sends back an acknowledgment (ACK) with the same SN.
If the sender receives the ACK before the timer expires, it increments its SN and sends the next message.
If the timer expires, the sender retransmits the same message.

This is essentially TCP’s behavior in its simplest form (though TCP uses sliding windows). Let’s implement this in Python.

import socket
import struct
import time
import threading

# Simple Stop-and-Wait sender
class StopWaitSender:
    def __init__(self, remote_addr, timeout=0.5):
        self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        self.sock.settimeout(timeout)  # for receiving ACKs
        self.remote_addr = remote_addr
        self.timeout = timeout
        self.seq = 0  # sequence number (0 or 1, but we can use any int)

    def send(self, data: bytes) -> bool:
        # Prepend sequence number as a 4-byte integer
        packet = struct.pack('I', self.seq) + data
        while True:
            self.sock.sendto(packet, self.remote_addr)
            try:
                ack, _ = self.sock.recvfrom(1024)
                ack_seq = struct.unpack('I', ack[:4])[0]
                if ack_seq == self.seq:
                    self.seq ^= 1  # flip for next (alternating bit)
                    return True
                # else: received ACK for wrong sequence (stale retransmission)
            except socket.timeout:
                # No ACK, retransmit
                pass

This is functional but terrible. It suffers from:

No pipelining: Only one message in flight at a time. Throughput = packet size / RTT. For a large RTT, throughput is abysmal.
Duplicate detection: Our alternating bit works, but if the ACK is lost, we retransmit, and the receiver gets duplicates. The receiver must also be able to handle duplicates (by ignoring them if they’ve already processed that seq).
Timer granularity: Python’s socket timeout is coarse (minimum ~1ms on some systems, often 100ms). For real-time apps, we need microsecond-resolution timers.

Despite its flaws, stop-and-wait illustrates the core concepts of sequence numbers and ACKs. Now let’s extend it.

2.2 Sliding Window – The Go-Back-N Protocol

To improve throughput, we allow multiple packets to be in flight before waiting for ACKs. This is the sliding window. A classic approach is Go-Back-N (GBN), where the sender has a window size N. Packets are sent with increasing sequence numbers. The receiver accepts only in-order packets and discards any out-of-order ones (it does not buffer them). The sender maintains a timer for the oldest unacknowledged packet. When that timer expires, the sender retransmits all packets from that point forward.

Go-Back-N is simple but inefficient under loss: one lost packet causes retransmission of many packets. A better alternative is Selective Repeat (SR), where the receiver buffers out-of-order packets and sends NACKs for missing ones. But before we go there, let’s understand the concept of cumulative ACKs.

Cumulative ACK: A receiver acknowledges the highest in-order sequence number received so far. For example, if receiver has seen packets 0, 1, 2, 3, then packet 5 arrives, it sends ACK for 3 (cumulative). This tells the sender that everything up to 3 is received; packet 4 is missing. The sender can then retransmit packet 4 alone.

This is what TCP does with its ACK numbers (though TCP also uses SACK—selective acknowledgments—to provide more detail). For our custom UDP protocol, we can adopt a similar scheme.

2.3 Designing a Simple Reliable-Ordered Protocol Over UDP

We’ll design a protocol called “RUDP” (Reliable UDP) that provides:

In-order delivery of messages.
Congestion control (optional, but we’ll include a simple AIMD).
Sliding window with cumulative ACKs.
Retransmission with exponential backoff.

Packet Format

+--------+--------+--------+--------+--------+--------+--------+--------+
|  Seq Number (4 bytes)                                               |
+--------+--------+--------+--------+--------+--------+--------+--------+
|  ACK Number (4 bytes)                                               |
+--------+--------+--------+--------+--------+--------+--------+--------+
|  Flags (1 byte)  |  Window (2 bytes)  |  Payload ...                |
+--------+--------+--------+--------+--------+--------+--------+--------+

Seq Number: Sequence number of this packet (modulo some large number, e.g., 2^32).
ACK Number: The cumulative ACK (highest in-order received sequence).
Flags: Bit 0 = ACK flag (packet is an acknowledgment only), Bit 1 = SYN, Bit 2 = FIN, etc.
Window: Advertised receive window (flow control).
Payload: Variable-length data (max ~1400 bytes to avoid IP fragmentation).

State on Sender

send_base: oldest unacknowledged sequence number.
next_seq: next sequence number to send.
window_size: congestion window + receiver window min.
retransmissions_count[seq]: number of times a packet has been retransmitted.
timer for the oldest packet.

State on Receiver

expected_seq: next expected sequence number (in-order).
received_buffer: map of out-of-order packets (if we support selective retransmission).
advertised_window: available buffer space.

Sender Logic (Pseudo-code)

// Assume we have a list of outstanding packets (seq -> data)
void send_packet() {
    while (next_seq < send_base + window_size) {
        Packet pkt = create_packet(next_seq, data[next_seq]);
        udp_send(pkt);
        start_timer_for_seq(next_seq);
        next_seq++;
    }
}

void on_ack_received(uint32_t ack_num) {
    if (ack_num > send_base) {
        // Cumulative ack, all packets up to ack_num are confirmed
        // Remove acknowledged packets from outstanding list
        for (seq = send_base; seq <= ack_num; seq++) {
            remove_from_outstanding(seq);
            cancel_timer(seq);
        }
        send_base = ack_num + 1;
        // Update RTT estimates for retransmission timeout
        update_rtt();
        // Send new packets if window opens
        send_packet();
    }
}

void on_timeout(uint32_t seq) {
    // Retransmit all packets from seq to next_seq-1 (Go-Back-N style)
    for (s = seq; s < next_seq; s++) {
        if (is_outstanding(s)) {
            udp_send(outstanding[s]);
            restart_timer(s); // possibly with double timeout
        }
    }
}

Receiver Logic

void on_packet_received(Packet pkt) {
    uint32_t seq = pkt.seq;
    if (seq == expected_seq) {
        // In-order packet
        process_data(pkt.payload);
        expected_seq++;
        // Check if we have buffered out-of-order packets that now become in-order
        while (has_packet(expected_seq)) {
            process_data(buffer[expected_seq]);
            remove_buffer(expected_seq);
            expected_seq++;
        }
        send_ack(expected_seq - 1); // cumulative ACK
    } else if (seq < expected_seq) {
        // Duplicate of already processed packet - ignore or send ACK again
        send_ack(expected_seq - 1); // re-ACK
    } else {
        // Out-of-order packet - buffer it
        buffer[seq] = pkt.payload;
        send_ack(expected_seq - 1); // cumulative ack indicates missing packets
    }
}

2.4 Adding Flow Control and Congestion Control

Our simple protocol above sends as many packets as the window allows. But if the sender is much faster than the receiver, the receiver’s buffer may overflow. We need flow control: the receiver includes its advertised window in each packet (the Window field). The sender’s effective window is min(congestion_window, advertised_window).

Congestion control is more complex. We can adopt TCP Reno style:

Slow start: For each ACK received, increase congestion_window by 1 MSS (max segment size) per ACK, effectively doubling per RTT.
Congestion avoidance: After a threshold (ssthresh), increase by 1 MSS per RTT (linear growth).
Packet loss detection: If a retransmission timeout occurs, set ssthresh = cwnd/2, cwnd = 1, and re-enter slow start.
Fast retransmit: If we receive 3 duplicate ACKs (i.e., cumulative ACK number not moving), we assume a packet loss and retransmit the missing packet without waiting for timeout.

Implementing this in userspace over UDP is entirely feasible and is essentially what all modern custom reliable protocols do.

Chapter 3: Real-World Implementations – WebRTC, QUIC, and Game Engines

Rather than reinventing the wheel (though it’s educational), many production systems rely on well-tested libraries. Let’s examine a few.

3.1 WebRTC (Data Channels)

WebRTC is a set of protocols for peer-to-peer real-time communication (audio, video, data) over UDP. Its data channel component provides a reliable, ordered or unordered delivery mechanism over SCTP (Stream Control Transmission Protocol) encapsulated over DTLS (Datagram Transport Layer Security) over UDP. SCTP itself is a transport protocol that provides message-oriented, multi-streaming, and multi-homing features. In WebRTC, SCTP runs on top of DTLS (for encryption), which runs on top of UDP.

Why not just use TCP for data channels? Because WebRTC often operates in environments where UDP is the only option (e.g., NAT traversal via ICE/STUN/TURN). Additionally, SCTP allows multiple streams within one connection, avoiding head-of-line blocking between independent messages. For example, a game can send chat messages on one stream and position updates on another; a lost chat message won’t block the position updates.

3.2 QUIC (HTTP/3)

QUIC (Quick UDP Internet Connections) is the foundation for HTTP/3. It’s a transport protocol designed to replace TCP+TLS for web traffic, but with drastically reduced latency. QUIC provides:

0-RTT connection establishment (on subsequent connections).
Stream multiplexing without head-of-line blocking (like SCTP).
Authenticated encryption (TLS 1.3 integrated).
Customizable congestion control and loss detection.
Packet pacing and ECN support.

QUIC is implemented entirely in userspace (e.g., Chromium’s implementation, or the quiche library). Operating systems are not required to support QUIC; it runs over UDP. This allows rapid deployment of new features without waiting for kernel updates. The reliability mechanisms in QUIC are similar to what we described: packet number (monotonically increasing), ACK frames with ranges, selective retransmission, and a loss detection engine based on packet thresholds (e.g., after receiving N later packets, the missing one is assumed lost).

3.3 Game Networking Libraries

In the game industry, several UDP-based protocols are widely used:

ENet: A library that provides reliable and unreliable channels over UDP. It supports sequencing, fragmentation, and multiple channels.
RakNet (now part of Oculus): A full networking engine with object replication, voice chat, and NAT traversal.
Lidgren.Network: A C# library used in many indie games.
SteamNetworkingSockets: Valve’s library that wraps SCTP-like functionality over UDP, used in games like Dota 2 and CS:GO.

These libraries typically expose two modes:

Unreliable: raw UDP, no overhead.
Reliable ordered: similar to our protocol above.
Reliable unordered: messages are delivered reliably but may be out of order.
Unreliable sequenced: messages are delivered in order but gaps are filled with the most recent data (e.g., for position updates).

Chapter 4: Advanced Topics – When Reliability Becomes the Enemy

Not all applications can tolerate even the minimal reliability of a custom protocol. Let’s explore scenarios where you might intentionally sacrifice reliability for speed.

4.1 Real-Time Audio – The Case for Ignoring Loss

In a VoIP call, lost packets can be compensated using packet loss concealment (PLC) algorithms. The audio codec (e.g., Opus, Speex, Silk) can interpolate missing samples. Retransmitting a lost packet would introduce unacceptable delay (the audio must be played out within ~20-50ms). Therefore, even a custom reliable protocol that adds a 100ms retransmission budget is harmful. The application should simply send packets and forget.

4.2 Video Streaming (Live)

Live video streaming over RTMP or HLS often uses TCP, but for ultra-low-latency streaming (e.g., WebRTC, SRT, FEC-based systems), UDP is preferred. Forward error correction (FEC) can be used: instead of retransmitting lost packets, send redundant data (e.g., XOR parity packets) that allow the receiver to recover from a certain percentage of loss without retransmission. FEC adds overhead but avoids round-trip delay.

4.3 Distributed Consensus – The Need for Strong Ordering

In distributed systems like Raft or Paxos, TCP is often used because the protocols require total order and reliability. However, some modern implementations (e.g., Raft over QUIC) use UDP to reduce latency, especially in geo-distributed deployments. The key is that the consensus protocol itself handles retransmission and ordering; the transport layer just provides a message-passing channel with minimal overhead.

4.4 High-Frequency Trading – Bypassing the OS

HFT firms often bypass the OS network stack entirely by using kernel bypass technologies like DPDK (Data Plane Development Kit) or RDMA (Remote Direct Memory Access) over InfiniBand or RoCE (RDMA over Converged Ethernet). These operate at the Ethernet layer, not IP/UDP. However, when UDP is used (e.g., for market data feeds), they implement custom reliable protocols with nanosecond timers and zero-copy buffer management.

Chapter 5: Concrete Code – A Minimal RUDP Library in C

To ground our discussion, let’s implement a minimal sliding-window RUDP library in C that compiles on Linux. We’ll support cumulative ACKs, retransmission with fixed timer, and flow control via advertised window. This is a simplification but captures the essence.

// rudp.h
#ifndef RUDP_H
#define RUDP_H

#include <stdint.h>
#include <sys/types.h>

#define MAX_PACKET_SIZE 1472 // 1500 - 20 (IP) - 8 (UDP) - room for headers
#define MAX_WINDOW 64
#define TIMEOUT_MS 100

struct rudp_header {
    uint32_t seq;
    uint32_t ack_seq;
    uint16_t window;
    uint8_t flags; // 0x01: ACK, 0x02: SYN, 0x04: FIN
};

struct rudp_packet {
    struct rudp_header hdr;
    uint8_t data[MAX_PACKET_SIZE - sizeof(struct rudp_header)];
    size_t data_len;
};

struct rudp_state {
    int sock_fd;
    struct sockaddr_in peer;
    // Sender state
    uint32_t send_base;
    uint32_t next_seq;
    uint32_t window_size;
    struct rudp_packet *outstanding[MAX_WINDOW];
    struct timeval timer[MAX_WINDOW];
    // Receiver state
    uint32_t expected_seq;
    struct rudp_packet *buffer[MAX_WINDOW];
    uint32_t last_ack_sent;
};

// Init, send, recv, close
int rudp_init(struct rudp_state *state, int port, const char *peer_ip, int peer_port);
int rudp_send(struct rudp_state *state, const uint8_t *data, size_t len);
int rudp_recv(struct rudp_state *state, uint8_t *buf, size_t *len);
void rudp_close(struct rudp_state *state);
void rudp_tick(struct rudp_state *state); // call periodically

#endif // RUDP_H

The implementation would involve:

rudp_send: if next_seq - send_base < window_size, create packet, send via sendto, add to outstanding array, set timer, increment next_seq.
rudp_recv: receive from socket, parse header, update receiver state, send ACK.
rudp_tick: iterate over outstanding, check timers, retransmit if expired.
Congestion control omitted for brevity.

This is a good exercise for the reader; implementing it reveals the intricacies of timer management, sequence number overflow, and memory management.

Chapter 6: Pitfalls and Best Practices

Building a reliable protocol over UDP is rewarding but treacherous. Here are common mistakes:

6.1 Ignoring MTU

Sending a packet larger than the path MTU (typically 1500 bytes for Ethernet) causes IP fragmentation. Fragments can be lost independently, increasing failure probability. Worse, some routers drop fragments. Always keep your UDP payload below ~1400 bytes (to account for IP and UDP headers). Use path MTU discovery or just cap at 1400.

6.2 Sequence Number Wrap

32-bit sequence numbers wrap after 4 billion packets. At high rates (e.g., 1 million pkts/sec), wrap occurs in about 72 minutes. Implement wrap detection (e.g., use a relative comparison: (int32_t)(seq - last) > 0 to detect advance, assuming wraps are not pathological).

6.3 Timer Precision

select/poll with millisecond precision may not be sufficient for sub-millisecond loss detection. Use timerfd_create (Linux) or epoll with timeout. For HFT, user-space spinning with rdtsc is common.

6.4 Denial of Service

UDP is easy to spoof. If you rely on ACKs for flow control, an attacker can send forged ACKs to artificially inflate the sender’s window. Always verify the source address and use a secret cookie per connection.

6.5 NAT Traversal

UDP is often blocked by NATs if the application doesn’t maintain binding. Use STUN/TURN or periodic keep-alive packets.

6.6 Buffer Bloat

Without congestion control, a UDP sender can fill router buffers (bufferbloat), causing latency spikes for all flows sharing the link. Always implement some form of congestion control, even if it’s a simple drop-threshold on the sender.

Chapter 7: Conclusion – Choose Wisely

TCP is not a liar; it’s a faithful servant that understands its domain. UDP is not a firehose; it’s a toolkit. The decision to use UDP over TCP is a decision to take control. You take the responsibility for reliability, ordering, congestion, and flow control. In return, you gain the ability to tailor the transport precisely to your application’s needs.

For 90% of network applications, TCP is the right answer. Use HTTP/2, WebSockets, or raw TCP. It’s battle-tested and efficient. But for that 10%—the games, the calls, the streams, the trades—UDP offers a path to performance that TCP cannot match. By understanding the underlying mechanisms of sequence numbers, ACKs, windows, and timers, you can build a transport layer that fits like a glove.

And if you don’t want to build it yourself, remember that QUIC, WebRTC, and specialized libraries like ENet are waiting for you. They have already solved the hard problems. Use them.

The next time you hear someone say “TCP is reliable,” remember that reliability is a spectrum. The question isn’t whether you can deliver every byte, but at what cost. With UDP, you decide the cost. That is the power of the honest messenger.

Further Reading:

RFC 768 – User Datagram Protocol
RFC 9000 – QUIC: A UDP-Based Multiplexed and Secure Transport
“TCP/IP Illustrated” by W. Richard Stevens
“The Design and Implementation of a Reliable UDP Protocol” (research paper)
ENet Library: http://enet.bespin.org/