-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added implementation documentation and bibliography
- Loading branch information
Showing
3 changed files
with
385 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
Implementation Notes | ||
==================== | ||
|
||
The main requirement for this program | ||
was to be easily deployable. | ||
Therefore, the following restrictions were put in place: | ||
|
||
(1) it will be written in an interpreted language, so Python 3 was chosen; | ||
(2) the whole program should be a single file; and | ||
(3) it should not require any library besides the modules | ||
coming with a standard Python distribution. | ||
|
||
|
||
XML pretty-printing | ||
------------------- | ||
|
||
While the standard Python `xml`_ module provide a pretty-printing functionality, | ||
it is not customizable enough. | ||
|
||
.. _xml: https://docs.python.org/3.6/library/xml.html | ||
|
||
Among the desired functionality was: | ||
|
||
- ability to specify which tags should be considered *inline* or *block*; | ||
- ability to have text wrap (at word level) at a maximum line length; | ||
- ability to specify the level indentation depth; | ||
- ... and possibly others. | ||
|
||
Therefore I wrote the pretty-printing myself, | ||
working on the XML tree node by node. | ||
|
||
In case one has never attempted a pretty-printing before, | ||
a couple of the most important references are: | ||
|
||
- John Hughes, “The design of a pretty-printing library” in *Advanced | ||
functional programming: First international spring school on advanced | ||
functional programming techniques, Båstad, Sweden, May 24–30, 1995*, | ||
Johan Jeuring and Erik Meijer (eds.),. Springer Berlin | ||
Heidelberg, Berlin, Heidelberg, pp. 53–96, 1995. | ||
https://doi.org/10.1007/3-540-59451-5_3 | ||
- Philip Wadler, “A prettier printer” in *Journal of functional | ||
programming*, pp. 223–244, 1998. | ||
|
||
|
||
Line wrapping | ||
------------- | ||
|
||
Again, the standard Python library includes `textwrap.fill`_ | ||
but no invocation could guarantee the exact match | ||
of the generated text (e.g. no adding or removing of spaces | ||
in significant places). | ||
|
||
.. _textwrap.fill: https://docs.python.org/3/library/textwrap.html | ||
|
||
I ended up implementing the Knuth algorithm (used in TeX etc.) | ||
for breaking a text into a series of balanced lines. | ||
Coming up with an efficient implementation was a matter | ||
of reading the appropriate papers in literature. | ||
|
||
For the curious, this was my bibliography; | ||
the first is the original Knuth paper, | ||
while the rest deal with optimizations. | ||
|
||
1. Donald E. Knuth and Michael F. Plass, “Breaking paragraphs into | ||
lines”, *Software: Practice and Experience*, vol. 11, no. 11, pp. | ||
1119–1184, 1981. https://doi.org/10.1002/spe.4380111102 | ||
|
||
2. A Aggarwal, M Klawe, S Moran, P Shor, and R Wilber, “Geometric | ||
applications of a matrix searching algorithm” in *Proceedings of the | ||
second annual symposium on computational geometry* (SCG ’86), pp. | ||
285–292., 1986. https://doi.org/10.1145/10515.10546 | ||
|
||
3. Daniel S. Hirschberg and Lawrence Louis Larmore, “New applications of | ||
failure functions”, *Journal of the ACM*, vol. 34, no. 3, pp. 616–625, | ||
1987. https://doi.org/10.1145/28869.28875 | ||
|
||
4. Daniel S. Hirschberg and Lawrence Louis Larmore, “The least weight | ||
subsequence problem”, *SIAM Journal on Computing*, vol. 16, no. 4, pp. | ||
628–638, 1987. https://doi.org/10.1137/0216043 | ||
|
||
5. Robert Wilber, “The concave least-weight subsequence problem | ||
revisited”, *Journal of Algorithms*, vol. 9, no. 3, pp. 418–425, 1988. | ||
https://doi.org/10.1016/0196-6774(88)90032-6 | ||
|
||
6. Zvi Galil and Raffaele Giancarlo, “Speeding up dynamic programming | ||
with application to molecular biology”, *Theoretical Computer Science*, | ||
vol. 64, no. 1, pp. 107–118, 1989. | ||
https://doi.org/10.1016/0304-3975(89)90101-1 | ||
|
||
7. David Eppstein, “Sequence comparison with mixed convex and concave | ||
costs”, *Journal of Algorithms*, vol. 11, no. 1, pp. 85–101, 1990. | ||
https://doi.org/10.1016/0196-6774(90)90031-9 | ||
|
||
8. Zvi Galil and Kunsoo Park, “A linear-time algorithm for concave | ||
one-dimensional dynamic programming”, *Information Processing Letters*, | ||
vol. 33, no. 6, pp. 309–311, 1990. | ||
https://doi.org/10.1016/0020-0190(90)90215-J | ||
|
||
9. David Eppstein, Zvi Galil, Raffaele Giancarlo, and Giuseppe F. | ||
Italiano, “Sparse dynamic programming ii: Convex and concave cost | ||
functions”, *Journal of the ACM*, vol. 39, no. 3, pp. 546–567, 1992. | ||
https://doi.org/10.1145/146637.146656 | ||
|
||
10. Alok Aggarwal and Takeshi Tokuyama, “Consecutive interval query and | ||
dynamic programming on intervals” in *Algorithms and computation: 4th | ||
international symposium, isaac ’93 hong kong, december 15–17, 1993 | ||
proceedings*, K. W. Ng, P. Raghavan, N. V. Balasubramanian and F. Y. L. | ||
Chin (eds.),. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. | ||
466–475., 1993. https://doi.org/10.1007/3-540-57568-5_278 | ||
|
||
11. Peter Becker, “Construction of nearly optimal multiway trees” in | ||
*Computing and combinatorics: Third annual international conference, | ||
cocoon ’97 shanghai, china, august 20–22, 1997 proceedings*, Tao Jiang | ||
and D. T. Lee (eds.),. Springer Berlin Heidelberg, Berlin, Heidelberg, | ||
pp. 294–303., 1997. https://doi.org/10.1007/BFb0045096 | ||
|
||
12. Oege de Moor and Jeremy Gibbons, “Bridging the algorithm gap: A | ||
linear-time functional program for paragraph formatting”, *Science of | ||
Computer Programming*, vol. 35, no. 1, pp. 3–27, 1999. | ||
https://doi.org/http://dx.doi.org/10.1016/S0167-6423(99)00005-2 | ||
|
||
|
||
Argument processing | ||
------------------- | ||
|
||
I usually use `click`_ | ||
for command-line utilities, | ||
but due to the restriction (3) above, | ||
all the argument processing was written in pure `argparse`_. | ||
|
||
.. _click: http://click.pocoo.org/ | ||
.. _argparse: https://docs.python.org/3/library/argparse.html | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,229 @@ | ||
|
||
@ references for implementing (efficiently) a line-breaking algorithm | ||
@ the first is the original Knuth algorithm; the rest are optimizations | ||
@article {SPE:SPE4380111102, | ||
author = {Knuth, Donald E. and Plass, Michael F.}, | ||
title = {Breaking paragraphs into lines}, | ||
journal = {Software: Practice and Experience}, | ||
volume = {11}, | ||
number = {11}, | ||
publisher = {John Wiley & Sons, Ltd.}, | ||
issn = {1097-024X}, | ||
url = {http://dx.doi.org/10.1002/spe.4380111102}, | ||
doi = {10.1002/spe.4380111102}, | ||
pages = {1119--1184}, | ||
year = {1981}, | ||
} | ||
|
||
|
||
@article{Hirschberg:1987:LWS:33367.33370, | ||
author = {Hirschberg, Daniel S. and Larmore, Lawrence Louis}, | ||
title = {The Least Weight Subsequence Problem}, | ||
journal = {SIAM Journal on Computing}, | ||
issue_date = {August 1987}, | ||
volume = {16}, | ||
number = {4}, | ||
month = aug, | ||
year = {1987}, | ||
issn = {0097-5397}, | ||
pages = {628--638}, | ||
numpages = {11}, | ||
url = {http://dx.doi.org/10.1137/0216043}, | ||
doi = {10.1137/0216043}, | ||
acmid = {33370}, | ||
publisher = {Society for Industrial and Applied Mathematics}, | ||
address = {Philadelphia, PA, USA}, | ||
} | ||
@article{Hirschberg:1987:NAF:28869.28875, | ||
author = {Hirschberg, Daniel S. and Larmore, Lawrence Louis}, | ||
title = {New Applications of Failure Functions}, | ||
journal = {Journal of the ACM}, | ||
issue_date = {July 1987}, | ||
volume = {34}, | ||
number = {3}, | ||
month = jul, | ||
year = {1987}, | ||
issn = {0004-5411}, | ||
pages = {616--625}, | ||
numpages = {10}, | ||
url = {http://doi.acm.org/10.1145/28869.28875}, | ||
doi = {10.1145/28869.28875}, | ||
acmid = {28875}, | ||
publisher = {ACM}, | ||
address = {New York, NY, USA}, | ||
} | ||
@inproceedings{Aggarwal:1986:GAM:10515.10546, | ||
author = {Aggarwal, A and Klawe, M and Moran, S and Shor, P and Wilber, R}, | ||
title = {Geometric Applications of a Matrix Searching Algorithm}, | ||
booktitle = {Proceedings of the Second Annual Symposium on Computational Geometry}, | ||
series = {SCG '86}, | ||
year = {1986}, | ||
isbn = {0-89791-194-6}, | ||
location = {Yorktown Heights, New York, USA}, | ||
pages = {285--292}, | ||
numpages = {8}, | ||
url = {http://doi.acm.org/10.1145/10515.10546}, | ||
doi = {10.1145/10515.10546}, | ||
acmid = {10546}, | ||
publisher = {ACM}, | ||
address = {New York, NY, USA}, | ||
} | ||
@inproceedings{Aggarwal:1986:GAM:10515.10546, | ||
author = {Aggarwal, A and Klawe, M and Moran, S and Shor, P and Wilber, R}, | ||
title = {Geometric Applications of a Matrix Searching Algorithm}, | ||
booktitle = {Proceedings of the Second Annual Symposium on Computational Geometry}, | ||
series = {SCG '86}, | ||
year = {1986}, | ||
isbn = {0-89791-194-6}, | ||
location = {Yorktown Heights, New York, USA}, | ||
pages = {285--292}, | ||
numpages = {8}, | ||
url = {http://doi.acm.org/10.1145/10515.10546}, | ||
doi = {10.1145/10515.10546}, | ||
acmid = {10546}, | ||
publisher = {ACM}, | ||
address = {New York, NY, USA}, | ||
} | ||
@article{Wilber:1988:CLS:51368.51379, | ||
author = {Wilber, Robert}, | ||
title = {The Concave Least-weight Subsequence Problem Revisited}, | ||
journal = {Journal of Algorithms}, | ||
issue_date = {September 1988}, | ||
volume = {9}, | ||
number = {3}, | ||
month = sep, | ||
year = {1988}, | ||
issn = {0196-6774}, | ||
pages = {418--425}, | ||
numpages = {8}, | ||
url = {http://dx.doi.org/10.1016/0196-6774(88)90032-6}, | ||
doi = {10.1016/0196-6774(88)90032-6}, | ||
acmid = {51379}, | ||
publisher = {Academic Press, Inc.}, | ||
address = {Duluth, MN, USA}, | ||
} | ||
@article{Galil:1989:SUD:64154.64250, | ||
author = {Galil, Zvi and Giancarlo, Raffaele}, | ||
title = {Speeding Up Dynamic Programming with Application to Molecular Biology}, | ||
journal = {Theoretical Computer Science}, | ||
issue_date = {April 1989}, | ||
volume = {64}, | ||
number = {1}, | ||
month = apr, | ||
year = {1989}, | ||
issn = {0304-3975}, | ||
pages = {107--118}, | ||
numpages = {12}, | ||
url = {https://doi.org/10.1016/0304-3975(89)90101-1}, | ||
doi = {10.1016/0304-3975(89)90101-1}, | ||
acmid = {64250}, | ||
publisher = {Elsevier Science Publishers Ltd.}, | ||
address = {Essex, UK}, | ||
} | ||
|
||
@article{Galil:1990:LAC:79790.79800, | ||
author = {Galil, Zvi and Park, Kunsoo}, | ||
title = {A Linear-time Algorithm for Concave One-dimensional Dynamic Programming}, | ||
journal = {Information Processing Letters}, | ||
issue_date = {February 1990}, | ||
volume = {33}, | ||
number = {6}, | ||
month = feb, | ||
year = {1990}, | ||
issn = {0020-0190}, | ||
pages = {309--311}, | ||
numpages = {3}, | ||
url = {http://dx.doi.org/10.1016/0020-0190(90)90215-J}, | ||
doi = {10.1016/0020-0190(90)90215-J}, | ||
acmid = {79800}, | ||
publisher = {Elsevier North-Holland, Inc.}, | ||
address = {Amsterdam, The Netherlands, The Netherlands}, | ||
} | ||
|
||
@article{Eppstein:1990:SCM:82765.82778, | ||
author = {Eppstein, David}, | ||
title = {Sequence Comparison with Mixed Convex and Concave Costs}, | ||
journal = {Journal of Algorithms}, | ||
issue_date = {March 1990}, | ||
volume = {11}, | ||
number = {1}, | ||
month = feb, | ||
year = {1990}, | ||
issn = {0196-6774}, | ||
pages = {85--101}, | ||
numpages = {17}, | ||
url = {http://dx.doi.org/10.1016/0196-6774(90)90031-9}, | ||
doi = {10.1016/0196-6774(90)90031-9}, | ||
acmid = {82778}, | ||
publisher = {Academic Press, Inc.}, | ||
address = {Duluth, MN, USA}, | ||
} | ||
|
||
@article{Eppstein:1992:SDP:146637.146656, | ||
author = {Eppstein, David and Galil, Zvi and Giancarlo, Raffaele and Italiano, Giuseppe F.}, | ||
title = {Sparse Dynamic Programming II: Convex and Concave Cost Functions}, | ||
journal = {Journal of the ACM}, | ||
issue_date = {July 1992}, | ||
volume = {39}, | ||
number = {3}, | ||
month = jul, | ||
year = {1992}, | ||
issn = {0004-5411}, | ||
pages = {546--567}, | ||
numpages = {22}, | ||
url = {http://doi.acm.org/10.1145/146637.146656}, | ||
doi = {10.1145/146637.146656}, | ||
acmid = {146656}, | ||
publisher = {ACM}, | ||
address = {New York, NY, USA}, | ||
} | ||
|
||
@Inbook{Becker1997, | ||
author="Becker, Peter", | ||
editor="Jiang, Tao and Lee, D. T.", | ||
title="Construction of nearly optimal multiway trees", | ||
bookTitle="Computing and Combinatorics: Third Annual International Conference, COCOON '97 Shanghai, China, August 20--22, 1997 Proceedings", | ||
year="1997", | ||
publisher="Springer Berlin Heidelberg", | ||
address="Berlin, Heidelberg", | ||
pages="294--303", | ||
isbn="978-3-540-69522-6", | ||
doi="10.1007/BFb0045096", | ||
url="https://doi.org/10.1007/BFb0045096" | ||
} | ||
|
||
@article{DEMOOR19993, | ||
title = "Bridging the algorithm gap: A linear-time functional program for paragraph formatting", | ||
journal = "Science of Computer Programming", | ||
volume = "35", | ||
number = "1", | ||
pages = "3--27", | ||
year = "1999", | ||
note = "", | ||
issn = "0167-6423", | ||
doi = "http://dx.doi.org/10.1016/S0167-6423(99)00005-2", | ||
url = "http://www.sciencedirect.com/science/article/pii/S0167642399000052", | ||
author = "Oege de Moor and Jeremy Gibbons", | ||
} | ||
|
||
@Inbook{Aggarwal1993, | ||
author="Aggarwal, Alok and Tokuyama, Takeshi", | ||
editor="Ng, K. W. and Raghavan, P. and Balasubramanian, N. V. and Chin, F. Y. L.", | ||
title="Consecutive interval query and dynamic programming on intervals", | ||
bookTitle="Algorithms and Computation: 4th International Symposium, ISAAC '93 Hong Kong, December 15--17, 1993 Proceedings", | ||
year="1993", | ||
publisher="Springer Berlin Heidelberg", | ||
address="Berlin, Heidelberg", | ||
pages="466--475", | ||
isbn="978-3-540-48233-8", | ||
doi="10.1007/3-540-57568-5_278", | ||
url="https://doi.org/10.1007/3-540-57568-5_278" | ||
} | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
@Inbook{Hughes1995, | ||
author="Hughes, John", | ||
editor="Jeuring, Johan and Meijer, Erik", | ||
title="The design of a pretty-printing library", | ||
bookTitle="Advanced Functional Programming: First International Spring School on Advanced Functional Programming Techniques B{\aa}stad, Sweden, May 24--30, 1995 Tutorial Text", | ||
year="1995", | ||
publisher="Springer Berlin Heidelberg", | ||
address="Berlin, Heidelberg", | ||
pages="53--96", | ||
isbn="978-3-540-49270-2", | ||
doi="10.1007/3-540-59451-5_3", | ||
url="https://doi.org/10.1007/3-540-59451-5_3" | ||
} | ||
|
||
@INPROCEEDINGS{Wadler98aprettier, | ||
author = {Philip Wadler}, | ||
title = {A Prettier Printer}, | ||
booktitle = {Journal of Functional Programming}, | ||
year = {1998}, | ||
pages = {223--244}, | ||
publisher = {Palgrave Macmillan} | ||
} | ||
|