Logs Multiplexing in Docker

Logs multiplexing is the way that docker sends logs from containers to whatever client might be requesting for those logs. Logs multiplexing is when the various data from the output streams of the process being run in the container is combined into one stream and passed to clients.

The Problem Docker Tried to Solve

Whenever programs are being executed, there are 2 available streams on which they can channel results / outputs to, we have the standard output (stdout) and the standard error (stderr); there is standard input (stdin) but that is mostly for receiving inputs into programs. Whenever you run the command like docker logs <container> the problem was now how docker is going to ensure that those outputs from the different streams arrive in the exact order they were being produced inside the container and also making sure that the different streams could be identified by the clients making the request to view those logs.

The naive approach could be managing two different networks on which results from stdout could be passed on one and the results from stderr could also be passed on the other. But then that would mean managing two different network streams and the whole complexity associated with that. Assuming one network stream goes down, what happens? How do we keep track of the order that the outputs were produced, these among many more reasons are the reason for multiplexing.

Multiplexer

In electronics, a multiplexer looks like something like this.

Multiplexer

Image source

From the simple eye view, you can see that there are a number of input streams that are combined to form one output. You can think of docker logs multiplexing as the same concept, combining the various streams into one single stream.

Decoding the Multiplexed Stream

Docker uses a simple framing protocol. Each chunk of data gets prefixed with an 8-byte header. The first byte identifies which stream it came from (0 for stdin, 1 for stdout, 2 for stderr), bytes 1-3 are reserved (always 0x00) and the next bytes (4-7) contain the payload size.

The header looks like this:

[STREAM_TYPE] [0x00, 0x00, 0x00] [PAYLOAD_SIZE]

The PAYLOAD_SIZE allows the client to parse the exact content in the stream coming in without buffering or figuring out where one chunk ends and the another begins.

This framing header lets the receiving end parse the stream unambiguously. The data stays intact and in order, but now it's carrying metadata about its origin. When you use commands like docker logs, you're actually reading from this multiplexed stream that Docker has stored. Docker can show you just stdout, just stderr, or both together because it knows which bytes came from which stream thanks to that multiplexing metadata. This solves the problem of how to manage two different network streams and the whole complexity associated with that, you just focus on having a single stream to deal with.

Breakdown of the 8-byte header

shell

Here's a simple example of how you might parse this in Python when connecting to Docker's attach API:

python

What goes over the wire for Hello?

Assuming your container outputs Hello to standard output, here is how docker will send it over the wire:

[1, 00, 00, 00, 00, 00, 00, 05] [48 65 6c 6c 6f]

where [48 65 6c 6c 6f] is the hex representation of the ASCII representation of the string "Hello". The 05 is the payload size and the 1 is the stream type, in this case, stdout.

This idea of multiplexing is not unique to Docker. It's a common pattern in systems engineering whenever there is a need to combine multiple logical channels into a single physical channel. HTTP/2 does this with web requests, SSH does this when you have multiple terminal sessions over one encrypted connection and so on.