-
Notifications
You must be signed in to change notification settings - Fork 14
Terminal Performance
Terminal output on Microsoft Windows is notoriously slow. This means that applications which need to write a lot of text to the terminal, e.g. via puts() or printf() functions, can easily be bottlenecked by terminal output! A typical symptom is that the application makes very slow progress while, at the same time, using very little CPU time. That is because the application is not actually limited by the CPU, but instead spends most of its time just waiting for terminal output operations to complete. In other words, the application spends an excessive amount of time being blocked on functions like puts() or printf() – as those functions can take a very long time to return, when the destination of the write operation is a terminal window.
It is not known why terminal output on Windows is slow, but it probably is related to the way how Microsoft implemented the inter-process communication from the console application to the terminal window. Anyway, it was found that using tee as an intermediary buffer between the console application and the terminal window can greatly improve the performance!

For this specific purpose, we can invoke the tee program with a NUL destination file. The NUL parameter instructs tee to just forward the input data from the stdin stream to the stdout stream, but not copy the data into a file.
gizmo.exe [...] | tee.exe NUL
Here is a simple test program, that generates a bunch of pseudo-random numbers:
import time
from random import random, seed
seed(42)
time_enter = time.monotonic()
for _ in range(1000000):
print(random())
time_leave = time.monotonic()
print("----")
print("Execution time: {:.2f} sec".format(time_leave - time_enter))Execution time when running directly in the Windows terminal:
C:\dev>python rand.py … Execution time: 202.56 sec
Execution time when running the same program and using tee as an intermediary buffer:
C:\dev>python rand.py | tee-x64.exe NUL … Execution time: 42.19 sec
Simply passing the output through tee results in a speed-up of ~4.8× 😎
Note
Tested on the following machine:
