This folder contains generic scripts for manipulating, processing, validating individual ELTeC repositories
-
REFRESH Updates your local copy of each repo by doing "git pull"; writes a new version of the file driver.tei which is used by subsequent processes. Run
python3 Scripts/refresh.py
(or bash scriptrefresh
) -
SUMMARIZE Processes the driver.tei file made by REFRESH to create a new summary of the whole collection of repos as an index.html file stored in your local copy of the distantreading.github.io repository. Run
python3 Scripts/summarize.py
(or bash scriptsummarize
) -
REPORT Processes the driver.tei file made by REFRESH to create a new index.html file listing titles etc. for each repository, stored in your local copy of the distantreading.github.io repository. Run
python3 Scripts/report.py
(or bash scriptreport
) -
EXPOSE Processes the driver.tei file made by REFRESH to create HTML link files for each title in each repository, stored in your local copy of the distantreading.github.io repository. These link files transform and display the source XML files direct from the main repository, using CSS and Javascript files stored in the distantreading.github.io repository. Run
python3 Scripts/expose.py
-
CHECKREPO Uses the XSLT script
checkUp.xsl
to validate each text in the level0 and level1 folders of the specified repo and prepare it for release. See below for details of the checks and modifications it carries out. A new version of each file is written to a folder calledOut
in the repo. Runpython3 Scripts/checkRepo.py xxx
to check/update the repository for language code xxx -
REFRESHREPO Does the equivalent of REFRESH, REPORT, EXPOSE for a single repository only. Run
python3 Scripts/refreshRepo.py xxx
to update repository for language code xxx. Note that the index page produced by SUMMARIZE is not updated by this script.
N extremely B if you run any of these except the first, don't forget to commit and push the changes if you want them to be visible at the website https://distantreading.github.io/ELTeC !!
These scripts assume you can run saxon
from the command line, and that you have a local installation of Rscript
.
You will need to edit these scripts:
- to specify path names for your local installation
- if you add a new language repository
The XSLT script checkUp.xsl
checks for some common problems in the way ELTeC texts are encoded, applying fixes wherever possible, and producing a new version of the text with a modified publicationStmt (inter alia) ready for Zenodo. It can be run against a single file for testing purposes, but it's meant to be used on a whole repository. To run it on the zzz language repository, use a command line like this python Scripts/checkRepo.py zzz 2>zzzLog.txt
This will save the output from the script (it's quite chatty) in a file called zzzLog.txt
for you to scan through looking for surprises.
For details of the checks and fixes applied, see the XSLT source code.
Note that this script only validates against the RELAXNG schema; the additional schematron checks defined by our ODD are only implemented when a text is opened in oXygen or Atom.
Makefile
- copy this into the root of your local copy of an ELTEC repo
- edit LOCAL to point to the path for your local copy of the repo
- edit LANG to match the language of your repo (e.g.
eng
for English) - edit the PREFIX to match the prefix of your text files (e.g.
ENG
) - run
make driver
to generate a driver file which will process all available text files - run
make validate
to check validity of each text file individually - run
make report
to generate a balance report using thereporter.xsl
stylesheet
To issue a complete new update run the scripts in the order shown above, i.e.
python refresh.py python report.py python summarize.py python expose.py
These scripts will update your local copy of the distantreading.github.io pages. You still need to push them to the repo to see changes on the web.
If you've added any new texts since the last time, you will also need to add the new files to the distantreading.github.io/ELTeC/xxx language repo.