-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathNLP-syntax.tex
181 lines (158 loc) · 9.09 KB
/
NLP-syntax.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
\documentclass[11pt]{article}
%\usepackage[top=20mm,left=20mm,right=20mm,bottom=15mm,a4paper]{geometry} % see geometry.pdf on how to lay out the page. There's lots.
\usepackage[top=20mm,left=20mm,right=20mm,bottom=15mm,headsep=15pt,footskip=15pt,a4paper]{geometry} % see geometry.pdf on how to lay out the page. There's lots.
%\geometry{a4paper} % or letter or a5paper or ... etc
% \geometry{landscape} % rotated page geometry
\usepackage[round]{natbib}
\setlength{\bibsep}{0.0pt}
\usepackage{color}
\usepackage{times}
%\usepackage[T1]{fontenc}
%\usepackage{mathptmx}
\usepackage{tikz-dependency}
\usepackage{enumitem}
%\usepackage{times}
\usepackage[procnames]{listings}
\usepackage{color}
\usepackage{todonotes}
\newenvironment{titlemize}[1]{%
\paragraph{#1}
\begin{itemize}
\setlength\itemsep{0pt}}
{\end{itemize}}
% See the ``Article customise'' template for come common customisations
\newcommand{\refeq}[1]{Equation~\ref{eq:#1}}
\newcommand{\reffig}[1]{Figure~\ref{fig:#1}}
\newcommand{\reftab}[1]{Table~\ref{tab:#1}}
\newcommand{\refsec}[1]{\textsection\ref{sec:#1}}
\newcommand{\newsec}[2]{\section{#1}\label{sec:#2}\noindent}
\newcommand{\newsubsec}[2]{\subsection{#1}\label{sec:#2}\noindent}
\newcommand{\argmax}{\operatornamewithlimits{argmax}}
\newcommand{\argmin}{\operatornamewithlimits{argmin}}
\makeatletter
\def\@maketitle{ % custom maketitle
\begin{center}%
{\bfseries \@title}%
{\bfseries \@author}%
\end{center}%
\smallskip \hrule \bigskip }
% custom section
\renewcommand{\section}{\@startsection
{section}% % the name
{1}% % the level
{0mm}% % the indent
{-0.8\baselineskip}% % the before skip
{0.3\baselineskip}% % the after skip
{\bfseries\large}}% the style
% custom subsection
\renewcommand{\subsection}{\@startsection
{subsection}% % the name
{2}% % the level
{0mm}% % the indent
{-0.8\baselineskip}% % the before skip
{0.3\baselineskip}% % the after skip
{\bfseries\large}}% the style
\renewcommand{\paragraph}{%
\@startsection{paragraph}{4}%
{\z@}{1.5ex \@plus 1ex \@minus .2ex}{-1em}%
{\normalfont\normalsize\bfseries}%
}\makeatother
%\title{{\LARGE Universal Parser (UP)}\\[-8mm]
%\includegraphics[height=8mm]{RUPA}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\includegraphics[height=8mm]{RUPA}}
\title{{\LARGE Natural Language Processing}\\[1.5mm]{\large Assignment 3: Syntax}}
\author{}
\date{} % delete this line to display the current date
%%% BEGIN DOCUMENT
\begin{document}
\definecolor{keywords}{RGB}{255,0,90}
\definecolor{comments}{RGB}{0,0,113}
\definecolor{red}{RGB}{160,0,0}
\definecolor{green}{RGB}{0,150,0}
\lstset{language=Python,
basicstyle=\ttfamily\small,
keywordstyle=\color{keywords},
commentstyle=\color{comments},
stringstyle=\color{red},
showstringspaces=false,
identifierstyle=\color{green},
procnamekeys={def,class}}
\maketitle
%\tableofcontents
%\vspace{3mm}
%\section{Introduction}
\noindent
% This assignment involves material from lectures 8 to 10. You should
% have watched the relevant videos, read the relevant chapters in the
% textbooks and made a serious attempt to complete the relevant labs
% before you attempt this assignment. If you feel you have done that
% and still find the instructions unclear, you are welcome to email the
% course teachers and/or go to office hours to ask for help.
\section{Introduction}
\noindent This assignment involves material from classes 8, 9 and 10. You should
have watched the relevant videos, read the relevant chapters in the textbook and
made a serious attempt at completing the relevant labs before you attempt this
assignment. If you feel you have done that and still find the instructions
unclear, you are welcome to email the course teachers and/or go to office hours
to ask for help. We indicate after each subtask how much we expect you to write.
If you are interested in receiving a VG grade, you may complete the
\textit{optional} exercise available to you, which is related to syntactic
formalisms. Note that the exercise must be completed in full in order to receive
VG credit. Please do not submit more than 5 pages overall. All answers should be
self-contained.
\section{Assignment}
\noindent
The assignment is based on the sentences in the \textbf{en10} set, which you annotated in Lab 8, parsed in Lab 9
and wrote a grammar for in Lab 10:
\begin{enumerate}[noitemsep]
\item He worked for the BBC for a decade.
\item She spoke to CNN Style about the experience.
\item Global warming has caused a change in the pattern of the rainy seasons.
\item I also wonder whether the Davis Cup played a part.
\item The scheme makes money through sponsorship and advertising.
\item If a Turkish employee quits, then the Turkish work councils come.
\item A witness told police that the victim had attacked the suspect in April.
\item Mr Osborne signed up with a US speakers agency after being sacked in July.
\item The RHS collected comments sent in by schoolchildren and teachers involved in the experiment.
\item National reaction to the events in Kansas demonstrated how deeply divided the country had become.
\end{enumerate}
For this assignment, you should submit answers to the following questions:
\begin{enumerate}[itemsep=3pt]
\item Discuss three cases, involving (at least) three different dependency relations, where your annotation in Lab 8 differed from that of another group or from the gold standard. Cite definitions or examples from the UD guidelines to support your conclusion about which annotation is correct in each case. Are there any mismatches between your linguistic intuition and the UD guidelines? Are some sentences ambiguous and do you think they could receive different interpretations?
\item Discuss two different errors involving (at least) two different dependency relations in the output of the parser trained in Lab 9 for the \textbf{en10} sentences. Discuss possible reasons the parser made these mistakes.
\item The parser trained in Lab 9 uses the arc-eager transition system for dependency parsing described in K\"{u}bler et al. Briefly describe how transition-based parsing works and write down the transition sequence used to construct the parse tree for sentence 1 to illustrate your description. List the content of the stack ($S$), buffer ($B$) and arc set ($A$) after each transition.
\item List the grammar you designed in Lab 10 after verifying that it does in fact generate sentences 1--5 with reasonable analyses. (By reasonable, we mean that it would be an acceptable tree according to some linguistic theory, it does not matter which one.) Draw all the parse trees that the grammar assigns to sentence 3 and describe the essential differences between the trees. Discuss what parts of the tree you think are correct and why your grammar also produces incorrect trees. If your grammar only produces one tree, discuss why this is the case and possible advantages and drawbacks of your grammar.
\item Using the grammar from lab 10, draw the completed chart for a parse of sentence 1 using the CKY algorithm described in Jurafsky and Martin and explain briefly how it works, including mentioning in which order the cells are filled. (Hint: have a serious look at the textbook for this question! It is easy to miss a step!)
\end{enumerate}
We expect between half a page and a page in response to each question. Please do not submit more than 5 pages overall.
Note that figures are excluded towards the page count.
Your answers to each question should be self-contained.
\section{VG: Constituency vs. Dependency}
Describe the concept of dependency and constituency within linguistics —both
formally and in your own words. What do you perceive to be the advantages and
disadvantages of each formalism — e.g. for languages with variable word order?
Is one formalism a better representation for NLP tasks than another? Why? For
which tasks?
\section{Grading Criteria}
\begin{titlemize}{Basic Criteria}
\item Answers are given in understandable English.
\item Answers are stated clearly and coherently.
\item Answers are essentially correct.
\end{titlemize}
\begin{titlemize}{Additional Criteria}
\item Answers are well motivated.
\item Answers are well illustrated.
\item Answers reveal extensive knowledge of the textbook chapter(s).
\end{titlemize}
To pass the assignment, you must meet all the basic criteria on all subparts of the assignment.
To get VG, you must in addition meet some of the additional criteria for most of subparts.
\section{Submission}
\noindent
Please submit your assignment though studium as a PDF file without identifying
information (i.e., do not include your name in the report or the file name). It
should follow the style and margins given in the example submission, even if
not created with LaTeX. The deadlines can be found in studium.
%Submit your assignment as a pdf file named
%firstname\_lastname\_assignment\_3.pdf. It should follow the style and
%margins given in the example submission even if not created with
%LaTeX. See deadline on studentportalen.
\end{document}