Skip to content

Commit

Permalink
added implementation documentation and bibliography
Browse files Browse the repository at this point in the history
  • Loading branch information
zvr committed Oct 15, 2017
1 parent 50ae8e6 commit 9c07495
Show file tree
Hide file tree
Showing 3 changed files with 385 additions and 0 deletions.
133 changes: 133 additions & 0 deletions Implementation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
Implementation Notes
====================

The main requirement for this program
was to be easily deployable.
Therefore, the following restrictions were put in place:

(1) it will be written in an interpreted language, so Python 3 was chosen;
(2) the whole program should be a single file; and
(3) it should not require any library besides the modules
coming with a standard Python distribution.


XML pretty-printing
-------------------

While the standard Python `xml`_ module provide a pretty-printing functionality,
it is not customizable enough.

.. _xml: https://docs.python.org/3.6/library/xml.html

Among the desired functionality was:

- ability to specify which tags should be considered *inline* or *block*;
- ability to have text wrap (at word level) at a maximum line length;
- ability to specify the level indentation depth;
- ... and possibly others.

Therefore I wrote the pretty-printing myself,
working on the XML tree node by node.

In case one has never attempted a pretty-printing before,
a couple of the most important references are:

- John Hughes, “The design of a pretty-printing library” in *Advanced
functional programming: First international spring school on advanced
functional programming techniques, Båstad, Sweden, May 24–30, 1995*,
Johan Jeuring and Erik Meijer (eds.),. Springer Berlin
Heidelberg, Berlin, Heidelberg, pp. 53–96, 1995.
https://doi.org/10.1007/3-540-59451-5_3
- Philip Wadler, “A prettier printer” in *Journal of functional
programming*, pp. 223–244, 1998.


Line wrapping
-------------

Again, the standard Python library includes `textwrap.fill`_
but no invocation could guarantee the exact match
of the generated text (e.g. no adding or removing of spaces
in significant places).

.. _textwrap.fill: https://docs.python.org/3/library/textwrap.html

I ended up implementing the Knuth algorithm (used in TeX etc.)
for breaking a text into a series of balanced lines.
Coming up with an efficient implementation was a matter
of reading the appropriate papers in literature.

For the curious, this was my bibliography;
the first is the original Knuth paper,
while the rest deal with optimizations.

1. Donald E. Knuth and Michael F. Plass, “Breaking paragraphs into
lines”, *Software: Practice and Experience*, vol. 11, no. 11, pp.
1119–1184, 1981. https://doi.org/10.1002/spe.4380111102

2. A Aggarwal, M Klawe, S Moran, P Shor, and R Wilber, “Geometric
applications of a matrix searching algorithm” in *Proceedings of the
second annual symposium on computational geometry* (SCG ’86), pp.
285–292., 1986. https://doi.org/10.1145/10515.10546

3. Daniel S. Hirschberg and Lawrence Louis Larmore, “New applications of
failure functions”, *Journal of the ACM*, vol. 34, no. 3, pp. 616–625,
1987. https://doi.org/10.1145/28869.28875

4. Daniel S. Hirschberg and Lawrence Louis Larmore, “The least weight
subsequence problem”, *SIAM Journal on Computing*, vol. 16, no. 4, pp.
628–638, 1987. https://doi.org/10.1137/0216043

5. Robert Wilber, “The concave least-weight subsequence problem
revisited”, *Journal of Algorithms*, vol. 9, no. 3, pp. 418–425, 1988.
https://doi.org/10.1016/0196-6774(88)90032-6

6. Zvi Galil and Raffaele Giancarlo, “Speeding up dynamic programming
with application to molecular biology”, *Theoretical Computer Science*,
vol. 64, no. 1, pp. 107–118, 1989.
https://doi.org/10.1016/0304-3975(89)90101-1

7. David Eppstein, “Sequence comparison with mixed convex and concave
costs”, *Journal of Algorithms*, vol. 11, no. 1, pp. 85–101, 1990.
https://doi.org/10.1016/0196-6774(90)90031-9

8. Zvi Galil and Kunsoo Park, “A linear-time algorithm for concave
one-dimensional dynamic programming”, *Information Processing Letters*,
vol. 33, no. 6, pp. 309–311, 1990.
https://doi.org/10.1016/0020-0190(90)90215-J

9. David Eppstein, Zvi Galil, Raffaele Giancarlo, and Giuseppe F.
Italiano, “Sparse dynamic programming ii: Convex and concave cost
functions”, *Journal of the ACM*, vol. 39, no. 3, pp. 546–567, 1992.
https://doi.org/10.1145/146637.146656

10. Alok Aggarwal and Takeshi Tokuyama, “Consecutive interval query and
dynamic programming on intervals” in *Algorithms and computation: 4th
international symposium, isaac ’93 hong kong, december 15–17, 1993
proceedings*, K. W. Ng, P. Raghavan, N. V. Balasubramanian and F. Y. L.
Chin (eds.),. Springer Berlin Heidelberg, Berlin, Heidelberg, pp.
466–475., 1993. https://doi.org/10.1007/3-540-57568-5_278

11. Peter Becker, “Construction of nearly optimal multiway trees” in
*Computing and combinatorics: Third annual international conference,
cocoon ’97 shanghai, china, august 20–22, 1997 proceedings*, Tao Jiang
and D. T. Lee (eds.),. Springer Berlin Heidelberg, Berlin, Heidelberg,
pp. 294–303., 1997. https://doi.org/10.1007/BFb0045096

12. Oege de Moor and Jeremy Gibbons, “Bridging the algorithm gap: A
linear-time functional program for paragraph formatting”, *Science of
Computer Programming*, vol. 35, no. 1, pp. 3–27, 1999.
https://doi.org/http://dx.doi.org/10.1016/S0167-6423(99)00005-2


Argument processing
-------------------

I usually use `click`_
for command-line utilities,
but due to the restriction (3) above,
all the argument processing was written in pure `argparse`_.

.. _click: http://click.pocoo.org/
.. _argparse: https://docs.python.org/3/library/argparse.html

229 changes: 229 additions & 0 deletions linebreaking.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@

@ references for implementing (efficiently) a line-breaking algorithm
@ the first is the original Knuth algorithm; the rest are optimizations
@article {SPE:SPE4380111102,
author = {Knuth, Donald E. and Plass, Michael F.},
title = {Breaking paragraphs into lines},
journal = {Software: Practice and Experience},
volume = {11},
number = {11},
publisher = {John Wiley & Sons, Ltd.},
issn = {1097-024X},
url = {http://dx.doi.org/10.1002/spe.4380111102},
doi = {10.1002/spe.4380111102},
pages = {1119--1184},
year = {1981},
}


@article{Hirschberg:1987:LWS:33367.33370,
author = {Hirschberg, Daniel S. and Larmore, Lawrence Louis},
title = {The Least Weight Subsequence Problem},
journal = {SIAM Journal on Computing},
issue_date = {August 1987},
volume = {16},
number = {4},
month = aug,
year = {1987},
issn = {0097-5397},
pages = {628--638},
numpages = {11},
url = {http://dx.doi.org/10.1137/0216043},
doi = {10.1137/0216043},
acmid = {33370},
publisher = {Society for Industrial and Applied Mathematics},
address = {Philadelphia, PA, USA},
}
@article{Hirschberg:1987:NAF:28869.28875,
author = {Hirschberg, Daniel S. and Larmore, Lawrence Louis},
title = {New Applications of Failure Functions},
journal = {Journal of the ACM},
issue_date = {July 1987},
volume = {34},
number = {3},
month = jul,
year = {1987},
issn = {0004-5411},
pages = {616--625},
numpages = {10},
url = {http://doi.acm.org/10.1145/28869.28875},
doi = {10.1145/28869.28875},
acmid = {28875},
publisher = {ACM},
address = {New York, NY, USA},
}
@inproceedings{Aggarwal:1986:GAM:10515.10546,
author = {Aggarwal, A and Klawe, M and Moran, S and Shor, P and Wilber, R},
title = {Geometric Applications of a Matrix Searching Algorithm},
booktitle = {Proceedings of the Second Annual Symposium on Computational Geometry},
series = {SCG '86},
year = {1986},
isbn = {0-89791-194-6},
location = {Yorktown Heights, New York, USA},
pages = {285--292},
numpages = {8},
url = {http://doi.acm.org/10.1145/10515.10546},
doi = {10.1145/10515.10546},
acmid = {10546},
publisher = {ACM},
address = {New York, NY, USA},
}
@inproceedings{Aggarwal:1986:GAM:10515.10546,
author = {Aggarwal, A and Klawe, M and Moran, S and Shor, P and Wilber, R},
title = {Geometric Applications of a Matrix Searching Algorithm},
booktitle = {Proceedings of the Second Annual Symposium on Computational Geometry},
series = {SCG '86},
year = {1986},
isbn = {0-89791-194-6},
location = {Yorktown Heights, New York, USA},
pages = {285--292},
numpages = {8},
url = {http://doi.acm.org/10.1145/10515.10546},
doi = {10.1145/10515.10546},
acmid = {10546},
publisher = {ACM},
address = {New York, NY, USA},
}
@article{Wilber:1988:CLS:51368.51379,
author = {Wilber, Robert},
title = {The Concave Least-weight Subsequence Problem Revisited},
journal = {Journal of Algorithms},
issue_date = {September 1988},
volume = {9},
number = {3},
month = sep,
year = {1988},
issn = {0196-6774},
pages = {418--425},
numpages = {8},
url = {http://dx.doi.org/10.1016/0196-6774(88)90032-6},
doi = {10.1016/0196-6774(88)90032-6},
acmid = {51379},
publisher = {Academic Press, Inc.},
address = {Duluth, MN, USA},
}
@article{Galil:1989:SUD:64154.64250,
author = {Galil, Zvi and Giancarlo, Raffaele},
title = {Speeding Up Dynamic Programming with Application to Molecular Biology},
journal = {Theoretical Computer Science},
issue_date = {April 1989},
volume = {64},
number = {1},
month = apr,
year = {1989},
issn = {0304-3975},
pages = {107--118},
numpages = {12},
url = {https://doi.org/10.1016/0304-3975(89)90101-1},
doi = {10.1016/0304-3975(89)90101-1},
acmid = {64250},
publisher = {Elsevier Science Publishers Ltd.},
address = {Essex, UK},
}

@article{Galil:1990:LAC:79790.79800,
author = {Galil, Zvi and Park, Kunsoo},
title = {A Linear-time Algorithm for Concave One-dimensional Dynamic Programming},
journal = {Information Processing Letters},
issue_date = {February 1990},
volume = {33},
number = {6},
month = feb,
year = {1990},
issn = {0020-0190},
pages = {309--311},
numpages = {3},
url = {http://dx.doi.org/10.1016/0020-0190(90)90215-J},
doi = {10.1016/0020-0190(90)90215-J},
acmid = {79800},
publisher = {Elsevier North-Holland, Inc.},
address = {Amsterdam, The Netherlands, The Netherlands},
}

@article{Eppstein:1990:SCM:82765.82778,
author = {Eppstein, David},
title = {Sequence Comparison with Mixed Convex and Concave Costs},
journal = {Journal of Algorithms},
issue_date = {March 1990},
volume = {11},
number = {1},
month = feb,
year = {1990},
issn = {0196-6774},
pages = {85--101},
numpages = {17},
url = {http://dx.doi.org/10.1016/0196-6774(90)90031-9},
doi = {10.1016/0196-6774(90)90031-9},
acmid = {82778},
publisher = {Academic Press, Inc.},
address = {Duluth, MN, USA},
}

@article{Eppstein:1992:SDP:146637.146656,
author = {Eppstein, David and Galil, Zvi and Giancarlo, Raffaele and Italiano, Giuseppe F.},
title = {Sparse Dynamic Programming II: Convex and Concave Cost Functions},
journal = {Journal of the ACM},
issue_date = {July 1992},
volume = {39},
number = {3},
month = jul,
year = {1992},
issn = {0004-5411},
pages = {546--567},
numpages = {22},
url = {http://doi.acm.org/10.1145/146637.146656},
doi = {10.1145/146637.146656},
acmid = {146656},
publisher = {ACM},
address = {New York, NY, USA},
}

@Inbook{Becker1997,
author="Becker, Peter",
editor="Jiang, Tao and Lee, D. T.",
title="Construction of nearly optimal multiway trees",
bookTitle="Computing and Combinatorics: Third Annual International Conference, COCOON '97 Shanghai, China, August 20--22, 1997 Proceedings",
year="1997",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="294--303",
isbn="978-3-540-69522-6",
doi="10.1007/BFb0045096",
url="https://doi.org/10.1007/BFb0045096"
}

@article{DEMOOR19993,
title = "Bridging the algorithm gap: A linear-time functional program for paragraph formatting",
journal = "Science of Computer Programming",
volume = "35",
number = "1",
pages = "3--27",
year = "1999",
note = "",
issn = "0167-6423",
doi = "http://dx.doi.org/10.1016/S0167-6423(99)00005-2",
url = "http://www.sciencedirect.com/science/article/pii/S0167642399000052",
author = "Oege de Moor and Jeremy Gibbons",
}

@Inbook{Aggarwal1993,
author="Aggarwal, Alok and Tokuyama, Takeshi",
editor="Ng, K. W. and Raghavan, P. and Balasubramanian, N. V. and Chin, F. Y. L.",
title="Consecutive interval query and dynamic programming on intervals",
bookTitle="Algorithms and Computation: 4th International Symposium, ISAAC '93 Hong Kong, December 15--17, 1993 Proceedings",
year="1993",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="466--475",
isbn="978-3-540-48233-8",
doi="10.1007/3-540-57568-5_278",
url="https://doi.org/10.1007/3-540-57568-5_278"
}


23 changes: 23 additions & 0 deletions prettyprinting.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
@Inbook{Hughes1995,
author="Hughes, John",
editor="Jeuring, Johan and Meijer, Erik",
title="The design of a pretty-printing library",
bookTitle="Advanced Functional Programming: First International Spring School on Advanced Functional Programming Techniques B{\aa}stad, Sweden, May 24--30, 1995 Tutorial Text",
year="1995",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="53--96",
isbn="978-3-540-49270-2",
doi="10.1007/3-540-59451-5_3",
url="https://doi.org/10.1007/3-540-59451-5_3"
}

@INPROCEEDINGS{Wadler98aprettier,
author = {Philip Wadler},
title = {A Prettier Printer},
booktitle = {Journal of Functional Programming},
year = {1998},
pages = {223--244},
publisher = {Palgrave Macmillan}
}

0 comments on commit 9c07495

Please sign in to comment.