-
Notifications
You must be signed in to change notification settings - Fork 0
/
master.tex
231 lines (193 loc) · 7.35 KB
/
master.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
\documentclass[a4paper]{scrartcl}
\usepackage{palatino}
%\usepackage{url}
%\usepackage{xcolor}
%\usepackage[colorlinks=true, linkcolor=black, linkbordercolor=blue, urlbordercolor=blue, urlcolor=black, pdfborderstyle={/S/U/W .5}]{hyperref}
%\usepackage{breakurl}
\usepackage{graphicx}
%\usepackage{multirow}
\usepackage{fancyhdr}
%\usepackage{comment}
%\usepackage{printlen} % Print lengths using specified units.
\usepackage[pdfborder=0 0 0]{hyperref}
%\usepackage{booktabs}
\usepackage[utf8]{inputenc} % Umlaute üöä auch normal benutzen und nicht maskieren
%\usepackage[rightcaption]{sidecap} % caption beside figure
\usepackage{amsmath, amssymb}
\usepackage{listings}
%\usepackage{subfigure}
%\usepackage{setspace}
\usepackage{threeparttable}
\usepackage{tabularx}
%\usepackage{todonotes}
\usepackage{paralist}
%\usepackage{siunitx}
\usepackage{amsfonts}
\newcommand{\tickYes}{\checkmark}
\usepackage{pifont}
\newcommand{\tickNo}{\hspace{1pt}\ding{55}}
\usepackage{xspace}
\usepackage{dirtree}
%\usepackage[width=.75\textwidth]{caption}
\usepackage{color}
\definecolor{gray}{rgb}{0.4,0.4,0.4}
\definecolor{darkblue}{rgb}{0.0,0.0,0.6}
\definecolor{cyan}{rgb}{0.0,0.6,0.6}
% custom commands
\newcommand{\wik}{\textit{Wiktionary}\xspace}
% http://www.ctan.org/tex-archive/fonts/ps-type1/cm-super/
\usepackage[T1]{fontenc} % use high quality fonts, please install cm-super
\newcommand{\lemon}{\emph{lemon} }
\newcommand{\lemonnospace}{\emph{lemon}}
\newcommand{\footnoteUrl}[1]{\footnote{\url{#1}}}
%%%% LISTINGS
\usepackage{listings} % Typeset source code listings using LaTeX.
% listing styles
\lstset{
numberbychapter=false,
numbers=left,
numberstyle=\tiny,
basicstyle=\ttfamily\fontsize{8}{8}\selectfont,
% tabsize=2,
% framexleftmargin=2pt,
captionpos=b,
frame=single,
breaklines=true,
columns=fullflexible,
showstringspaces=false,
commentstyle=\color{gray},
literate={ö}{{\"o}}1
{ä}{{\"a}}1
{ü}{{\"u}}1
}
%\lstdefinestyle{rdf}{numberblanklines=true, morekeywords={}}
\lstdefinestyle{sparql}{numberblanklines=true, morekeywords={SELECT, FROM, WHERE, FILTER, GROUP BY, IN, AS, LIMIT, OFFSET, PREFIX, OPTIONAL, UNION}}
\lstdefinestyle{N3}{numberblanklines=true, morekeywords={foaf, prefix, rdf, skos, rdfs, ex, xsd, wr, wt, dc, lemon, doap, @prefix}}
\lstdefinestyle{wikitext}{numberblanklines=true, morekeywords={\{\{, \}\}, =, |, [[, ]]}}
%\usepackage{textcomp}
\lstdefinestyle{XML}
{
morestring=[b]",
morestring=[s]{>}{<},
morecomment=[s]{<?}{?>},
stringstyle=\color{black},
identifierstyle=\color{darkblue},
keywordstyle=\color{cyan},
morekeywords={xmlns,version,type}% list your attributes here
}
%%%% todo
% \newcommand{\todo}[1]{\textbf{[ToDo: #1]}}
% draws PDF notes instead of in-text todos
%\usepackage{pdfmarginpar}\newcommand{\todo}[1]{\pdfmarginpar[Note]{ToDo: #1}}
%\newcommand{\todo}[1]{}
\hypersetup{colorlinks=true,urlcolor=blue,linkcolor=black,citecolor=black}
%resize texttt (doesnt play well with palatino I guess )
%\let\Oldtexttt\texttt
%\renewcommand{\texttt}{\small\Oldtexttt}
\newcommand{\NAME}{Flexible RDF data extraction from Wiktionary}
\newcommand{\TITLE}{\NAME}
\newcommand{\TITLER}{Leveraging the power of community build linguistic Wikis}
\newcommand{\KEYWORDS}{Semantic Web, Natural Language Processing, Information Extraction}
\pdfinfo{
/Title (\TITLE)
/Creator (TeX)
/Keywords (\KEYWORDS)
/Producer (pdfTeX and Document $Revision: 5900 $)
/Author (Jonas Brekle)
%/CreationDate (D:19980212201000)
%/ModDate (D:19980212201000)
/Subject (\NAME)
}
\graphicspath{{images/}} % include-pfad für Grafiken
\DeclareGraphicsExtensions{.pdf,.png}
% \renewcommand{\vec}[1]{\overrightarrow{#1}}
\fancyhf{}
\pagestyle{fancy}
% Seitenzahl bei geraden/linken Seiten nach links/aussen
\fancyhead[R]{\thepage}
\fancyhead[L]{\leftmark}% Kapitel/Abschnitt
\begin{document}
\begin{titlepage}
\vspace{2 cm}
\thispagestyle{empty}
\begin{center}
\Large{
Universität Leipzig\\
Fakultät für Mathematik und Informatik\\
Institut für Informatik\\
}
\vspace{3 cm}
\textbf{
\Large{
\TITLE\\
}
}
\TITLER\\
\vspace{2 cm}
Masterarbeit\\
im Studiengang Master Informatik
\end{center}
\vspace{6 cm}
\begin{flushleft}
\begin{tabular}{lll}
& & \\
\textbf{eingereicht von:} & & Jonas Brekle\\
& & \\
& & \\
\textbf{eingereicht am:} & & 1. August 2012\\
& & \\
& & \\
\textbf{betreuender Professor:} & & Prof. Dr. Ing. habil. Klaus-Peter Fähnrich\\
& & \\
& & \\
\textbf{Betreuer:} & & Dipl. Inf. Sebastian Hellmann
\end{tabular}
\end{flushleft}
\end{titlepage}
\thispagestyle{empty}
\begin{abstract}
We present a declarative approach implemented in a comprehensive open-source framework (based on \textit{\mbox{DBpedia}}) to extract lexical-semantic resources (an ontology about language use) from \wik.
The data currently includes language, part of speech, senses, definitions, synonyms, taxonomies (hyponyms, hyperonyms, synonyms, antonyms) and translations for each lexical word.
Main focus is on flexibility to the loose schema and configurability towards differing language-editions of \wik.
This is achieved by a declarative mediator/wrapper approach.
The goal is to allow the addition of languages just by configuration without the need of programming, thus enabling the swift and resource-conserving adaption of wrappers by domain experts.
%TODO maybe add crowd sourcing and DBpedia againcd
%to new usage scenarios by crowd-sourcing ( proven working by DBpedia \cite{dbpedia_jws_09} and the \textit{Mappings Wiki}\footnote{a community interested in curating the DBpedia dataset collaboratively maintains the core of the extraction configuration --- the infobox label mappings --- at \url{http://mappings.dbpedia.org/}}).
The extracted data is as fine granular as the source data in \wik and additionally follows the \textit{lemon} model.
It enables use cases like disambiguation or machine translation.
By offering a linked data service, we hope to extend DBpedia's central role in the LOD infrastructure to the world of Open Linguistics.
\end{abstract}
\vfill
I thank Sebastian Hellmann for the courageous supervision and a professional research environment; furthermore Theresa for her support. Without them, this work would not have been possible.
\pagebreak
\pagenumbering{Roman}
\setcounter{page}{1}
% Inhaltsverzeichnis
\tableofcontents
\pagebreak
%changing the pagenumbering style (roman->arabic) resets pagecoutner. workaround:
\newcounter{tmppage}
\setcounter{tmppage}{\value{page}}
\pagenumbering{arabic}
\setcounter{page}{\value{tmppage}}
\input{sections/1einleitung.tex}
\input{sections/2grundlagen.tex}
\input{sections/3anforderungen.tex}
\input{sections/4spezifikation.tex}
\input{sections/5implementierung.tex}
\input{sections/6evaluation.tex}
\input{sections/7zusammenfassung+rest.tex}
\appendix
\section{Literature}
\bibliography{master,iswcwikt,dfg,aksw,ref}
\bibliographystyle{alpha}
\newpage
\section*{Erklärung}
\thispagestyle{empty}
Hiermit erkläre ich, dass ich die vorliegende Arbeit selbstständig angefertigt habe. Die aus fremden Quellen direkt oder indirekt übernommenen Inhalte sind als solche kenntlich gemacht.\\
Diese Arbeit wurde bisher in gleicher oder ähnlicher Form keiner anderen Prüfungsbehörde vorgelegt und auch noch nicht veröffentlicht.\\
Ich bin mir bewusst, dass nicht wahrheitsgemäße Angaben rechtlich verfolgt werden können.\\
\vspace{1cm}\\
\makebox[2in]{\hrulefill}\\
Jonas Brekle
\end{document}