A python script to enrich a skos list with synonyms taken from different sources
The script uses the following data sources:
-
Wiktionary:
- downloaded from dbnary (http://kaiko.getalp.org/about-dbnary/download/ the core in English)
- uploaded on GraphDB (under the user folder graphdb-import) with graph name "wiktionary"
- the data is filtered with the script getSynonymsWiktionary.py which generates a file with name syn_wiktionary.ttl (already present in this repository)
- as the syn_wiktionary.ttl file is relatively small it is parsed directly
-
Wordnet:
- downloaded from wordnet-rdf (https://wordnet-rdf.princeton.edu/about)
- uploaded on GraphDB (under the user folder graphdb-import) with graph name "wordnet"
- the data is filtered with the script getSynonymsWordnet.py which generates a file with name syn_wordnet.ttl (already present in this repository)
- the file has been uploaded on the graph name "wordnet-synonyms"
-
Unesco:
- downloaded from vocabularies.unesco.org (http://vocabularies.unesco.org/exports/thesaurus/latest/)
- uploaded on GraphDB (under the user folder graphdb-import) with graph name "unesco"
-
FIBO:
- downloaded from https://spec.edmcouncil.org/fibo/vocabulary� (the production)
- uploaded on GraphDB (under the user folder graphdb-import) with graph name "fibo"
-
STW:
- downloaded from http://zbw.eu/stw/version/latest/download/about.en.html� (v 9.06)
- uploaded on GraphDB (under the user folder graphdb-import) with graph name "stw"
-
LCSH new pilot:
- downloaded from http://id.loc.gov/download/ "LC Subject Headings (LCSH) NEW Pilot (SKOS/RDF only)
- uploaded on GraphDB (under the user folder graphdb-import) with graph name "lcsh
If no value has been found in 1), 2), 3), 4), 5) or 6) then search in:
- Datamuse API (max 100.000 requests per day with no key):
- connecting to https://www.datamuse.com/api/ via https://github.com/gmarmstrong/python-datamuse
If no value has been found in Datamuse API then search in:
- Altervista API (max 5.000 requests per day with key to be passed to the script see below):
- connecting to http://thesaurus.altervista.org/
Usage:
getSynonyms.py [-h] [-k KEY] [-w WIK_FILE] [-i INPUT_FILE] [-o OUTPUT_FILE]
optional arguments:
-h, --help "show this help message and exit"
-k KEY, --apikey KEY "Api key file for Altervista"
-w WIK_FILE, --wiktionaryfile WIK_FILE "syn file for wikitionary"
-i INPUT_FILE, --input INPUT_FILE "input file in RDF/XML"
-o OUTPUT_FILE, --output OUTPUT_FILE "output file in Turtle"
Example:
python getSynonyms.py -k 1234567890 # replace it with your own key
would generate by default the file "output.ttl" containing the synonyms as SKOS alternative labels.