Skip to content

Multi‐Threading

Eric Rothstein edited this page Jan 25, 2024 · 44 revisions

The tee utility uses multi-threading and triple buffering.

Multiple threads are used to optimize the throughput: One thread is created for reading incoming data from the stdin stream into the internal buffer. Another thread is created for writing outgoing data from the internal buffer to the stdout stream. Additionally, for each output file, a separate thread is created for dumping (copying) the buffered data into the respective file. All of those threads are running concurrently, allowing data to be read and written at the same time.

In order to avoid conflicts, three separate buffers are used. The buffers are organized as a ring, i.e. every time that a new chunk of data is stored the buffer containing the "oldest" chunk is overwritten. Also, we make sure that a chunk can only be overwritten, after it has been written completely. Access to the buffers is protected by using R/W locks, which allows multiple "reader" threads to access a buffer at the same time, while the "writer" thread always acquires exclusive access to the buffer.

Illustration

The following figures illustrate the buffering/threading scheme:

Step #1 Step #2 Step #3 Step #4 Step #5

Note

Each orange arrow represents a separate thread. There are four concurrent threads in this example.

Number of Buffers

In theory, this buffering/threading scheme can be implemented with just two buffers (double buffering). At least two buffers are required to allow concurrent reading and writing operations. However, it should be noted that the above illustration only shows one possible sequence! For example, it may happen that more than one of the chunks in the buffers still are in progress of being written to their destinations when the next input chunk becomes available. In this situation, double buffering would require the pipeline to stall until a "free" buffer becomes available, whereas triple buffering avoids the stalling.

As an example, the situation may look like this:

Alternative

Note

It was found that triple buffering gives a small but measurable performance improvement, compared to simpler double buffering. Using more than three buffers did not further improve the performance in our tests, though.

Write Combining

By default, each incoming chunk of data is placed in a separate buffer. Also, the buffer is released immediately, so that the data can be written to the destinations as soon as possible. This minimizes the response times – but it can be detrimental to the overall performance, if the "source" application produces a large number of small chunks at high frequency.

With write combining enabled (option -b), multiple small chunks of input data can be concatenated into the same buffer, and the buffer will not be released until a certain minimum amount of data has been accumulated. This may improve the throughput, but it also comes at the cost of an increased delay. Therefore, this option is disabled by default.

write-comb-1 write-comb-2 write-comb-3

Clone this wiki locally