Skip to content

Commit

Permalink
Expand on some probably non-obvious "scare" quotes.
Browse files Browse the repository at this point in the history
  • Loading branch information
c-blake committed Apr 4, 2024
1 parent e5a90e0 commit e5f3ffa
Showing 1 changed file with 11 additions and 5 deletions.
16 changes: 11 additions & 5 deletions doc/widths.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,16 +70,16 @@ shows summary stats on line widths for this source code repository at the time
of writing this document. (Note the `'i'` in `-oi` and `.Ni` must match.)

All `niom` information can be derived from the exact histogram, but if there was
a lot of input and, say, you cared about "time series" properties for a custom
analysis, then the (tiny 19 lines of real logic[^2]) source code of `widths.nim`
a lot of input and, say, you cared about "time series"[^2] properties for custom
analysis, then the (tiny 19 lines of real logic[^3]) source code of `widths.nim`
shows how to use [nio](https://github.com/c-blake/nio) to "stay in binary".

Avoiding oft recommended but expensive binary -> ASCII -> binary conversion
cycles can sometimes mean orders of magnitude speed-ups. E.g., running `widths
**.c >/dev/null` on Linux-6.2.8 source unpacked in /dev/shm (about 658 MB and
22.75e6 lines in ~32e3 files) took 17.5 seconds. Simply adding in `-oi` took
the time down to 0.755s - over 23X faster. Adding `|cstats` or `|niom` changes
these times to 26.33 & 1.00 because of parsing costs, a worse ratio.[^3] { Just
these times to 26.33 & 1.00 because of parsing costs, a worse ratio.[^4] { Just
mapping files & framing lines with memchr via `widths -d **.c >/dev/null` takes
0.57sec or ~25ns/line. }

Expand All @@ -91,12 +91,18 @@ program is one way to maybe decide what length to give it.
[^1]: `awk '{a[length($0)]++} END{for(i in a)print i,a[i]}'` also works, but is
~3X slower than `widths` for ASCII out & supports no binary out mode easily.

[^2]: As with many `bu` utilities, documenting it for broader consumption &|
[^2]: The "independent/x"-axis here is not time but depth into input(s), but
methods in statistics related to order-dependent independent variable axes go by
"time" series since times/dates are the most common ordering. Of course, the
fact that in "linear data entry" you type, hit ENTER, type some more relates the
two here, but only weakly since, you know, text editors can Arrow-Up. ;-)

[^3]: As with many `bu` utilities, documenting it for broader consumption &|
appreciation is most of the work. This one is especially trivial, actually --
more about measuring IO than anything else { which I think could be a tad faster
with a file size check; stdio is probably faster for < ~2048 bytes.. }

[^3]: On a personal note, not trusting number parsing but especially formatting
[^4]: On a personal note, not trusting number parsing but especially formatting
to be "essentially free" was among my first lessons learning systems programming
decades ago. The way "Unix philosophy" is often presented makes this a lesson
learnt anew generation after generation. That is more a failure in teaching &|
Expand Down

0 comments on commit e5f3ffa

Please sign in to comment.