You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This page explains the column planner of tabulate, which automatically decides the width of each column of the rendered table, in order to minimize the total number of lines taken by the table.
We start from a dynamic programming planner based on brute-force line search, and then explore how we may improve it with bisect algorithm, by leveraging the monotonicity of the lower bound of our objective.
Introduction
Notation:
Let denote the total number of columns whose widths are not specified by the user.
Let denote the total width budget of the table.
Let denote the minimum number of lines taken by the -th row and the first columns of the table, with a total width budget of over the columns. will be the minimum total number of lines taken by the table.
Let denote the number of lines taken by the -th row and the -th column of the table, whose column width is . This quantity is reported by the underlying line-wrapping algorithm. By the nature of the line wrapping problem, we assume that is monotonically non-increasing in .
Let denote the number of lines taken by the -th row of all columns whose widths have been specified by the user. This quantity is also reported by the line-wrapping algorithm.
Let be a proposal of allocating width to the -th column, and leave width to the first columns. equals one of the with the optimal decision .
In code, is denoted by dp, and is denoted by nl.
Almost immediately, we have the inductive definition of as follows:
Apparently, is the optimal decision, the optimal width allocated to the -th column while the first columns admit a total width budget .
Brute decision maker
We simply traverse all and find the that minimize .
Can we do better?
Bisect decision maker
Before reaching our bisect optimization, we first observe that:
Theorem 1. is monotonically non-increasing in . Proof.
We prove by induction.
Since by assumption is monotonically non-increasing, and is constant in , we may conclude that is monotonically non-increasing.
Now suppose that is monotonically non-increasing, then similarly, is monotonically non-increasing in for all fixed decision .
But is a with the optimal decision , is likewise monotonically non-increasing in .
QED.
Given that is monotonically non-increasing, we naturally notice that is monotonically non-decreasing in for any fixed .
Thus, while finding the optimal decision for the -th column, we are minimizing , with the two arguments of the maximum being monotonically non-decreasing and non-increasing.
We may get inspired from a second theorem:
Theorem 2.
The minimum of the maximum between a monotonically non-decreasing series and a monotonically non-increasing series can be found using bisect algorithm. Proof.
We know for sure that the minimum can be achieved where the two series meet each other.
By assumption, one of the difference between the two series is monotonically non-decreasing.
So we are able to solve the original problem by finding the zero of the non-decreasing series, which can be solved using bisect algorithm.
At best, we will find the point closest to zero, and that will be our solution.
QED.
Since the summation keeps monotonicity, we are now able to efficiently find .
There's still a gap, since what we want to solve instead is .
But the gap can be closed under specific condition:
Lemma 1. is a lower bound of our objective .
Theorem 3.
If and is tight, then is also the minimizer of our objective over the set . Proof.
Note the inequality:
where the reasons of equality or inequality are: (1) the bound is tight at ; (2) is the minimizer of ; (3) is a lower bound of our objective.
QED.
In fact, we may efficiently search the lower bound, and then explore around the minimum to actually find the real minimum:
Lemma 2.
If we suppose minimizes , then is monotonically non-increasing over , and monotonically non-decreasing over .
Theorem 4.
Suppose is the minimizer of .
Suppose is the largest position where is tight, and is the smallest position where is tight, then
Proof.
By Lemma 2, is the minimizer of over .
By Theorem 3, is also the minimizer of over .
Similarly, is the minimizer of over .
Therefore, we need only to search by brute-force the range in order to guarantee to find the real minimum.
QED.
How much computation can be saved then?
Unfortunately there's no theoretic guarantee.
But we may show empirically through simulation.
For example, the two figures below show the lower bound (orange) and the real objective (blue).
We only need to search in the region where they don't match:
Infinite lines consumption
We assign if the column is not wide enough to hold certain words, such that they protrude out the column and damage the table layout.
The infinite cost will prevent such case from happening, if ever possible.
We justify this setup below:
Theorem 5. is monotonically non-increasing at presence of infinity. Proof.
It's easy to show that introducing infinity does not ruin the monotonicity of in : if , then ; if , then .
QED.
Theorem 6. if and only if , either or is infinity.
Here we show an important theorem:
Theorem 7.
If the lower bound , it's tight. Proof.
By Theorem 6 and definition of and the objective , it follows trivially that they are equal.
QED.
Therefore, when we follow Theorem 4 performing local line search, arriving at infinite lower bound also indicates tightness has been found.
Benchmark
We empirically showcase the efficiency of the bisect optimization.
Conclusion
Instead of brute search all possible while making decision, we may opt to bisect followed by local search in the surrounding.
This greatly enhance the efficiency.