Work in progress!!
The Master Thesis Extractor is a temporal information extraction system, which outputs RDF from Wikipedia Infoboxes.
It reuses the DBPedia Extraction Framework Mapping Extractor to extract values and reuses Heidel Time to extract temporal expressions.
Currently the main focus of MTE is to extract a time-based dataset about 'companies'. The latest release of the dataset can be downloaded from: http://tiny.cc/tmpcompany
In order to run an extraction on your own either download a binary release or build it on your own.
MTE uses a MongoDB to cache wiki articles and revisions. By default MTE connects to the Mongo DB listening on 127.0.0.1:27017. Set the environment variable MTE_MONGODB to a valid connection string in order to overwrite the default value.
Run the app using a binary release:
- Unzip the binary release file
- Execute 'bin/mte', respectively 'bin/mte.bat'
Run your own build:
- Follow the instructions to compile your own build
- Execute 'sbt start' from the MTE project directory
For further details see the play documentation.
MTE is a Play Framework application and therefore uses sbt as its build tool. For further information please see the play documentation
MTE expects all its dependencies being available in a Apache Ivy or a Apache Maven repository.
TODO: Provide details on where to download dependencies, which are not on maven central.
Requirements: Java 8, sbt
The source code is published under the terms of the GNU General Public License, version 2.
asdasdjkk