ANSI CSI/OSC/SGR color escape sequences are divisive. Many enjoy the extra,
categorical emphasis colors can yield. Others dislike their interference with
tools oriented around unembellished text. One compromise is $NOCOLOR
, as
advocated by https://nocolor.org/. Another idea is an easy tool to wedge into a
pipeline to sanitize input for a next stage &| to test "uncolored readability"
(e.g. for the color blind).
In the latter case, a simple sed 's/[[^m]*m//g'
filter "mostly" does the job,
but corner cases of CSI/OSC syntax
exist not handled by the above. E.g., a stray newline embedded in an Esc-[..
can cause trouble. So, a new, more careful filter utility is motivated.
noc
(short for "nocolor" or "noCSIOSC") is just a standard input-to-standard
output filter with no options or other command syntax.
If given a whole, memory mappable file, noc
does a single pass. Otherwise a
stdio buffered mode is used.
Broken &| hostile input can leave a CSI/OSC construct unterminated potentially to EOF. This can cause expansion of an IO buffer to all-input and more notably, unless one propagates parser state across buffers, a parse re-start after each read, repeating work & making total CPU time quadratic (to emit very little!). So, while a naive take away reading the code might seem like "Much sound & fury to optimize work non-repetition", it's actually there to work on bad input.
For example, one can create a 100 MB file of input:
$ printf '\e]%100000000sm' | tr ' ' '\n' > /dev/shm/hard
This input breaks a 3.5-ish second sed
(producing 100 MLines of output, not 0
bytes). It also blows up CPU time on a naive buffered implementation of noc
to many more seconds. But noc
itself dispatches the work in 38 millisec as a
whole file and about 135ms in a pipeline1, producing correct, empty output
both ways at about 750..2600 MB/s.2
Footnotes
-
While faster here, the memory mapped way was more done to verify the more complex, pipeline-friendly buffered implementation. ↩
-
This is fast enough for my purposes here - hey 25-100X faster than a
sed
that I never felt too slow.cligen/textUt.noCSI_OSC
only works byte-at-a-time. So,memchr
-like SIMD optimization (possibly just usingmemchr-\e
) can surely speed it up, at the cost of substantial complexity; CFRO anyone? ;-) ↩