196196
197197%\listoftables
198198
199+
199200%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
200201\chapter*{Preface}
201202\addcontentsline{toc}{fmbm}{Preface}
@@ -247,7 +248,7 @@ \chapter*{Preface}
247248 the fundamental tools of compiler construction: \emph{abstract
248249 syntax trees} and \emph{recursive functions}.
249250{\if\edition\pythonEd
250- \item In Chapter~\ref{ch:parsing-Lvar } we learn how to use the Lark
251+ \item In Chapter~\ref{ch:parsing} we learn how to use the Lark
251252 parser generator to create a parser for the language of integer
252253 arithmetic and local variables. We learn about the parsing
253254 algorithms inside Lark, including Earley and LALR(1).
@@ -307,14 +308,13 @@ \chapter*{Preface}
307308mathematics.
308309%
309310At the beginning of the course, students form groups of two to four
310- people. The groups complete one chapter every two weeks, starting
311- with chapter~\ref{ch:Lvar} and finishing with
312- chapter~\ref{ch:Llambda}. Many chapters include a challenge problem
313- that we assign to the graduate students. The last two weeks of the
311+ people. The groups complete approximately one chapter every two
312+ weeks, starting with chapter~\ref{ch:Lvar}. The last two weeks of the
314313course involve a final project in which students design and implement
315314a compiler extension of their choosing. The last few chapters can be
316- used in support of these projects. For compiler courses at
317- universities on the quarter system (about ten weeks in length), we
315+ used in support of these projects. Many chapters include a challenge
316+ problem that we assign to the graduate students. For compiler courses
317+ at universities on the quarter system (about ten weeks in length), we
318318recommend completing the course through chapter~\ref{ch:Lvec} or
319319chapter~\ref{ch:Lfun} and providing some scaffolding code to the
320320students for each compiler pass.
@@ -337,7 +337,6 @@ \chapter*{Preface}
337337Technology, University of Freiburg, University of Massachusetts
338338Lowell, and the University of Vermont.
339339
340-
341340\begin{figure}[tp]
342341\begin{tcolorbox}[colback=white]
343342 {\if\edition\racketEd
@@ -370,32 +369,35 @@ \chapter*{Preface}
370369\fi}
371370{\if\edition\pythonEd
372371\begin{tikzpicture}[baseline=(current bounding box.center)]
373- \node (C1) at (0,1.5) {\small Ch.~\ref{ch:trees-recur} Preliminaries};
374- \node (C2) at (4,1.5) {\small Ch.~\ref{ch:Lvar} Variables};
375- \node (C3) at (8,1.5) {\small Ch.~\ref{ch:register-allocation-Lvar} Registers};
376- \node (C4) at (0,0) {\small Ch.~\ref{ch:Lif} Conditionals};
377- \node (C5) at (4,0) {\small Ch.~\ref{ch:Lvec} Tuples};
378- \node (C6) at (8,0) {\small Ch.~\ref{ch:Lfun} Functions};
379- \node (C9) at (0,-1.5) {\small Ch.~\ref{ch:Lwhile} Loops};
380- \node (C8) at (4,-1.5) {\small Ch.~\ref{ch:Ldyn} Dynamic};
372+ \node (Prelim) at (0,1.5) {\small Ch.~\ref{ch:trees-recur} Preliminaries};
373+ \node (Var) at (4,1.5) {\small Ch.~\ref{ch:Lvar} Variables};
374+ \node (Parse) at (8,1.5) {\small Ch.~\ref{ch:parsing} Parsing};
375+ \node (Reg) at (0,0) {\small Ch.~\ref{ch:register-allocation-Lvar} Registers};
376+ \node (Cond) at (4,0) {\small Ch.~\ref{ch:Lif} Conditionals};
377+ \node (Loop) at (8,0) {\small Ch.~\ref{ch:Lwhile} Loops};
378+ \node (Fun) at (0,-1.5) {\small Ch.~\ref{ch:Lfun} Functions};
379+ \node (Tuple) at (4,-1.5) {\small Ch.~\ref{ch:Lvec} Tuples};
380+ \node (Dyn) at (8,-1.5) {\small Ch.~\ref{ch:Ldyn} Dynamic};
381381% \node (CO) at (0,-3) {\small Ch.~\ref{ch:Lobject} Objects};
382- \node (C7) at (8,-1.5) {\small Ch.~\ref{ch:Llambda} Lambda};
383- \node (C10) at (4,-3) {\small Ch.~\ref{ch:Lgrad} Gradual Typing};
384- \node (C11) at (8,-3) {\small Ch.~\ref{ch:Lpoly} Generics};
385-
386- \path[->] (C1) edge [above] node {} (C2);
387- \path[->] (C2) edge [above] node {} (C3);
388- \path[->] (C3) edge [above] node {} (C4);
389- \path[->] (C4) edge [above] node {} (C5);
390- \path[->,style=dotted] (C5) edge [above] node {} (C6);
391- \path[->] (C5) edge [above] node {} (C7);
392- \path[->] (C6) edge [above] node {} (C7);
393- \path[->] (C4) edge [above] node {} (C8);
394- \path[->] (C4) edge [above] node {} (C9);
395- \path[->] (C7) edge [above] node {} (C10);
396- \path[->] (C8) edge [above] node {} (C10);
397- % \path[->] (C8) edge [above] node {} (CO);
398- \path[->] (C10) edge [above] node {} (C11);
382+ \node (Lam) at (0,-3) {\small Ch.~\ref{ch:Llambda} Lambda};
383+ \node (Gradual) at (4,-3) {\small Ch.~\ref{ch:Lgrad} Gradual Typing};
384+ \node (Generic) at (8,-3) {\small Ch.~\ref{ch:Lpoly} Generics};
385+
386+ \path[->] (Prelim) edge [above] node {} (Var);
387+ \path[->] (Var) edge [above] node {} (Reg);
388+ \path[->] (Var) edge [above] node {} (Parse);
389+ \path[->] (Reg) edge [above] node {} (Cond);
390+ \path[->] (Cond) edge [above] node {} (Tuple);
391+ \path[->,style=dotted] (Tuple) edge [above] node {} (Fun);
392+ \path[->] (Cond) edge [above] node {} (Fun);
393+ \path[->] (Tuple) edge [above] node {} (Lam);
394+ \path[->] (Fun) edge [above] node {} (Lam);
395+ \path[->] (Cond) edge [above] node {} (Dyn);
396+ \path[->] (Cond) edge [above] node {} (Loop);
397+ \path[->] (Lam) edge [above] node {} (Gradual);
398+ \path[->] (Dyn) edge [above] node {} (Gradual);
399+ % \path[->] (Dyn) edge [above] node {} (CO);
400+ \path[->] (Gradual) edge [above] node {} (Generic);
399401\end{tikzpicture}
400402\fi}
401403\end{tcolorbox}
@@ -506,9 +508,11 @@ \chapter{Preliminaries}
506508 syntax}\index{subject}{abstract syntax
507509 tree}\index{subject}{AST}\index{subject}{program}\index{subject}{parse}
508510The process of translating from concrete syntax to abstract syntax is
509- called \emph{parsing}~\citep{Aho:2006wb}\python{ and is studied in
510- chapter~\ref{ch:parsing-Lvar}}.
511- \racket{This book does not cover the theory and implementation of parsing.}%
511+ called \emph{parsing}\python{ and is studied in
512+ chapter~\ref{ch:parsing}}.
513+ \racket{This book does not cover the theory and implementation of parsing.
514+ We refer the readers interested in parsing to the thorough treatment
515+ of parsing by \citet{Aho:2006wb}.}%
512516%
513517\racket{A parser is provided in the support code for translating from
514518 concrete to abstract syntax.}%
@@ -4090,23 +4094,23 @@ \section{Challenge: Partial Evaluator for \LangVar{}}
40904094%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
40914095{\if\edition\pythonEd
40924096\chapter{Parsing}
4093- \label{ch:parsing-Lvar }
4097+ \label{ch:parsing}
40944098\setcounter{footnote}{0}
40954099\index{subject}{parsing}
40964100
40974101In this chapter we learn how to use the Lark parser
4098- generator ~\citep{shinan20:_lark_docs} to translate the concrete syntax
4102+ framework ~\citep{shinan20:_lark_docs} to translate the concrete syntax
40994103of \LangInt{} (a sequence of characters) into an abstract syntax tree.
41004104You will then be asked to use Lark to create a parser for \LangVar{}.
4101- We then learn about the parsing algorithms used inside Lark, studying
4102- the \citet{Earley:1970ly} and LALR algorithms.
4105+ We also describe the parsing algorithms used inside Lark, studying the
4106+ \citet{Earley:1970ly} and LALR(1) algorithms.
41034107
4104- A parser generator takes in a specification of the concrete syntax and
4105- produces a parser. Even though a parser generator does most of the
4106- work for us, using one properly requires some knowledge. In
4107- particular, we must learn about the specification languages used by
4108- parser generators and we must learn how to deal with ambiguity in our
4109- language specifications.
4108+ A parser framework such as Lark takes in a specification of the
4109+ concrete syntax and the input program and produces a parse tree. Even
4110+ though a parser framework does most of the work for us, using one
4111+ properly requires some knowledge. In particular, we must learn about
4112+ its specification languages and we must learn how to deal with
4113+ ambiguity in our language specifications.
41104114
41114115The process of parsing is traditionally subdivided into two phases:
41124116\emph{lexical analysis} (also called scanning) and \emph{syntax
@@ -4119,16 +4123,16 @@ \chapter{Parsing}
41194123the use of a faster but less powerful algorithm for lexical analysis
41204124and the use of a slower but more powerful algorithm for parsing.
41214125%
4122- Likewise, parser generators typical come in pairs, with separate
4123- generators for the lexical analyzer (or lexer for short) and for the
4124- parser. A paricularly influential pair of generators were
4125- \texttt{lex} and \texttt{yacc}. The \texttt{lex} generator was written
4126- by \citet{Lesk:1975uq} at Bell Labs. The \texttt{yacc} generator was
4127- written by \citet{Johnson:1979qy} at AT\&T and stands for Yet Another
4128- Compiler Compiler.
4129-
4130- The Lark parse generator that we use in this chapter includes both a
4131- lexical analyzer and a parser . The next section discusses lexical
4126+ %% Likewise, parser generators typical come in pairs, with separate
4127+ %% generators for the lexical analyzer (or lexer for short) and for the
4128+ %% parser. A paricularly influential pair of generators were
4129+ %% \texttt{lex} and \texttt{yacc}. The \texttt{lex} generator was written
4130+ %% by \citet{Lesk:1975uq} at Bell Labs. The \texttt{yacc} generator was
4131+ %% written by \citet{Johnson:1979qy} at AT\&T and stands for Yet Another
4132+ %% Compiler Compiler.
4133+ %
4134+ The Lark parse framwork that we use in this chapter includes both
4135+ lexical analyzers and parsers . The next section discusses lexical
41324136analysis and the remainder of the chapter discusses parsing.
41334137
41344138
@@ -4522,10 +4526,13 @@ \section{The Earley Algorithm}
45224526more efficient but can only handle a subset of the context-free
45234527grammars.
45244528
4525- The Earley algorithm uses a data structure called a
4526- \emph{chart}\index{subject}{chart} to keep track of its progress. The
4527- chart is an array with one slot for each position in the input string,
4528- where position $0$ is before the first character and position $n$ is
4529+ The Earley algorithm can be viewed as an interpreter; it treats the
4530+ grammar as the program being interpreted and it treats the concrete
4531+ syntax of the program-to-be-parsed as its input. The Earley algorithm
4532+ uses a data structure called a \emph{chart}\index{subject}{chart} to
4533+ keep track of its progress and to memoize its results. The chart is an
4534+ array with one slot for each position in the input string, where
4535+ position $0$ is before the first character and position $n$ is
45294536immediately after the last character. So the array has length $n+1$
45304537for an input string of length $n$. Each slot in the chart contains a
45314538set of \emph{dotted rules}. A dotted rule is simply a grammar rule
@@ -4553,8 +4560,8 @@ \section{The Earley Algorithm}
45534560\begin{lstlisting}
45544561 lang_int: . stmt_list (0)
45554562\end{lstlisting}
4556- in slot $0$ of the chart. The algorithm then proceeds to its
4557- \emph{prediction} phase in which it adds more dotted rules to the
4563+ in slot $0$ of the chart. The algorithm then proceeds to with
4564+ \emph{prediction} actions in which it adds more dotted rules to the
45584565chart based on which nonterminal come after a period. In the above,
45594566the nonterminal \code{stmt\_list} appears after a period, so we add all
45604567the rules for \code{stmt\_list} to slot $0$, with a period at the
@@ -4767,13 +4774,15 @@ \section{The Earley Algorithm}
47674774\section{The LALR(1) Algorithm}
47684775\label{sec:lalr}
47694776
4770- The LALR(1) algorithm consists of a finite automata and a stack to
4771- record its progress in parsing the input string. Each element of the
4772- stack is a pair: a state number and a grammar symbol (a terminal or
4773- nonterminal). The symbol characterizes the input that has been parsed
4774- so-far and the state number is used to remember how to proceed once
4775- the next symbol-worth of input has been parsed. Each state in the
4776- finite automata represents where the parser stands in the parsing
4777+ The LALR(1) algorithm can be viewed as a two phase approach in which
4778+ it first compiles the grammar into a state machine and then runs the
4779+ state machine to parse the input string. The state machine also uses
4780+ a stack to record its progress in parsing the input string. Each
4781+ element of the stack is a pair: a state number and a grammar symbol (a
4782+ terminal or nonterminal). The symbol characterizes the input that has
4783+ been parsed so-far and the state number is used to remember how to
4784+ proceed once the next symbol-worth of input has been parsed. Each
4785+ state in the machine represents where the parser stands in the parsing
47774786process with respect to certain grammar rules. In particular, each
47784787state is associated with a set of dotted rules.
47794788
@@ -4797,7 +4806,7 @@ \section{The LALR(1) Algorithm}
47974806\emph{item}. There are several rules that could apply next, both rule
479848072 and 3, so state 1 also shows those rules with a period at the
47994808beginning of their right-hand sides. The edges between states indicate
4800- which transitions the automata should make depending on the next input
4809+ which transitions the machine should make depending on the next input
48014810token. So, for example, if the next input token is \code{INT} then the
48024811parser will push \code{INT} and the target state 4 on the stack and
48034812transition to state 4. Suppose we are now at the end of the input. In
@@ -10155,7 +10164,7 @@ \subsection{Optimize Blocks}
1015510164the constant \TRUE{} in \code{explicate\_pred}, in which we discard the
1015610165\code{els} continuation.
1015710166%
10158- {\if\edition\racketEd
10167+ {\if\edition\racketEd
1015910168The following example program falls into this
1016010169case, and it creates two unused blocks.
1016110170\begin{center}
@@ -10277,11 +10286,12 @@ \subsection{Optimize Blocks}
1027710286 [else
1027810287 (let ([label (gensym 'block)])
1027910288 (set! basic-blocks (cons (cons label t) basic-blocks))
10280- (Goto label))]))
10289+ (Goto label))])))
1028110290\end{lstlisting}
1028210291\end{minipage}
1028310292\end{center}
1028410293\fi}
10294+
1028510295{\if\edition\pythonEd
1028610296%
1028710297Here is the new version of the \code{create\_block} auxiliary function
@@ -20663,6 +20673,7 @@ \section{Type Checking \LangGrad{}}
2066320673
2066420674\fi}
2066520675
20676+
2066620677\clearpage
2066720678
2066820679\section{Interpreting \LangCast{}}
@@ -20780,7 +20791,7 @@ \section{Interpreting \LangCast{}}
2078020791from \CANYTY{} to \INTTY{}.
2078120792}
2078220793\python{
20783- For the subscript \code{v[i]} in \code{f([ v[i])} of \code{map\_inplace},
20794+ For the subscript \code{v[i]} in \code{f(v[i])} of \code{map\_inplace},
2078420795 the proxy casts the integer from \INTTY{} to \CANYTY{}.
2078520796 For the subscript on the left of the assignment,
2078620797 the proxy casts the tagged value from \CANYTY{} to \INTTY{}.
0 commit comments