Experimental prototypes based on the dataset produced by the Nineteenth-Century Knowledge Project led by Peter M. Logan.
To reproduce the POC from this repository and the corpus.
- create a new folder poc
- clone this repository into poc/eb-pre
- clone the Encyclopedia repository in a separate folder poc/kp-editions
cd poc/eb-pre/dataln -s ../../kp-editions
And remove superseded copies of the encyclopedia entries:
rm -rf kp-editions/eb07/TXT_*/ kp-editions/eb07/XML_*/
Note that as of 2025Q2, eb07/TXT and /XML will always contain the latest version. Other TXT_* and XML_* folders should be ignored. However for eb09, the latest (and only) version is currently in TXT_v1 and XML_v1.
cd poc/eb-prepython3 -m venv venvsource venv/bin/activatepip install -U pippip install -r build/requirements.txt
cd poc/eb-pre/toolsrm ../data/DOMAINS_SET/index.json# see value for DOMAINS_SET in settings.pypython prep.py
cd poc/eb-pre/toolsrm ../data/semantic_search/*python classify.pypython compress.py ../data/semantic_search/semantic_search-edition_7-doc2vec-learn-mc_40-ng_1-tm_0.5-ch_sentence.tv2.json 2
cd poc/eb-pre/docsnpm cipython3 -m http.server 8000- visit the following URL with your browser: http://localhost:8000/docs/