@@ -3,7 +3,7 @@ \section{\ocaml Implementation}
3
3
4
4
\lstset {language=[Objective]Caml}
5
5
6
- We also implemented the complete
6
+ We also implemented our
7
7
language generation algorithm in \ocaml .
8
8
% The \ocaml version only implements the ``latest'' version of the
9
9
% algorithm with a segmented representation, fast backward lookup and convolutions
@@ -91,12 +91,12 @@ \section{\ocaml Implementation}
91
91
\autoref {code:sigs:word } contains the signature for words.
92
92
It provides
93
93
the empty word (for \code {One}),
94
- singleton words (for \code {Atom}), and to append two words .
94
+ singleton words (for \code {Atom}), and append.
95
95
Neither an ordering nor a length operation is needed:
96
96
Comparison is encapsulated in the segment
97
97
data structure and the length of a word is the index of the segment in
98
98
which it appears.
99
-
99
+ %
100
100
This signature is satisfied by the \ocaml \code {string}
101
101
type (\ie arrays of bytes), arrays, lists of characters, or ropes. The
102
102
type of individual characters is unrestricted.
@@ -106,8 +106,7 @@ \section{\ocaml Implementation}
106
106
\autoref {code:sigs:segment } contains the signature for segments.
107
107
% The first group of operations creates and tests for empty segments and
108
108
% singleton segments.
109
- The main requirement is to support the operations on power series as described in \autoref {sec:gener-cross-sect }.
110
- We also requires the set operations
109
+ The main requirement is to support the operations on power series as described in \autoref {sec:gener-cross-sect } and the set operations
111
110
\code {union}, \code {inter} and \code {inter}.
112
111
%
113
112
The product described in \autoref {eq:1 } is decomposed in two parts:
@@ -119,7 +118,7 @@ \section{\ocaml Implementation}
119
118
by invocations of \code {append}.
120
119
\end {itemize }
121
120
%
122
- Experimentation with transient data-structures require s
121
+ Experimentation with transient data-structures requires
123
122
an explicit \code {memoize} function that avoids recomputing segments accessed
124
123
multiple times.
125
124
%
@@ -317,7 +316,7 @@ \subsection{Data Structures}
317
316
%
318
317
Such a memoization function incurs a linear cost on enumerations. To test
319
318
if this operation is worthwhile we implemented two modules:
320
- \code {ThunkList} where memoization is the identity and \code {ThunkListMemo}
319
+ \code {ThunkList} without memoization and \code {ThunkListMemo}
321
320
with the implementation described above.
322
321
323
322
\paragraph {Lazy Lists }
@@ -338,8 +337,8 @@ \subsection{Data Structures}
338
337
339
338
As the main operations on segments are set operations, one might
340
339
expect a set implementation to perform well. We implemented segments as sets
341
- of words using \ocaml 's built-in \code {Set} module. \ocaml sets are implemented
342
- using balanced binary trees.
340
+ of words using \ocaml 's built-in \code {Set} module which relies on
341
+ balanced binary trees.
343
342
The only operations not implemented by \ocaml 's standard library are
344
343
the n-way merge and the product.
345
344
% , which can be implemented using folds and unions.
@@ -351,38 +350,29 @@ \subsection{Data Structures}
351
350
as maps from words to values where a word belongs to its domain if there is a
352
351
path reaching a value labeled with the characters in the word.
353
352
Tries seem well adapted to our problem:
354
- \begin {itemize }[leftmargin=*]
355
- \item As all words in a segment have the same length, we only need values at the leaves.
356
- % As no prefixes need to be represented.
357
- \item The \code {append} operation on tries can be implemented by
358
- grafting the second trie to all the leaves of the first one.
359
- \end {itemize }
360
-
353
+ since all words in a segment have the same length, we only need values at the leaves.
354
+ % % As no prefixes need to be represented.
355
+ % \item The \code{append} operation on tries can be implemented by
356
+ % grafting the second trie to all the leaves of the first one.
357
+ % \end{itemize}
358
+ %
361
359
Hence, we can implement tries like tries of integers \cite {Okasaki98fastmergeable }.
362
360
For simplicity, we do not use path compression, which means
363
361
that branches are always labeled with one character.
364
- A trie is either \code {Empty}, a \code {Leaf} containing a value, or a \code {Node} containing a map from characters
362
+ A trie is either \code {Empty}, a \code {Leaf} or a \code {Node} containing a map from characters
365
363
to its child tries.
366
364
% As we are only interested in the paths, we consider tries
367
365
% without values.
368
-
369
- \ begin{lstlisting}
370
- type trie =
371
- | Empty
372
- | Leaf
373
- | Node of trie CharMap.t
374
- \end {lstlisting }
375
-
366
+ %
376
367
% The implementation of most operations follows the literature.
377
368
The only novel operation is \code {append} which computes the product of two sets.
378
- As we only store values at the leaves,
379
- it can be implemented in a single traversal which will graft the appended trie
380
- \code {t0} at each leaf of \code {t}, without copies.
369
+ It can be implemented in a single traversal which grafts the
370
+ appended trie \code {t0} at each leaf of \code {t}, without copies.
381
371
382
372
\ begin{lstlisting}
373
+ type trie = Empty | Leaf | Node of trie CharMap.t
383
374
let rec append t t0 = match t with
384
- | Empty -> Empty
385
- | Leaf -> t0
375
+ | Empty -> Empty | Leaf -> t0
386
376
| Node map ->
387
377
CharMap.map (fun t' -> append t' t0) map
388
378
\end {lstlisting }
0 commit comments