Karp-S pipeline

This is Språkbankens tool for turning structural data, mainly lexicons, into uniform JSON data, optionally augmented with UD tags. It also prepares and installs data into the Karp-S backend used in Språkbankens tool Karp-S.

Documentation

The pipeline is centered around:

importers - currently JSONL and some variants of CSV
modifiers - currently tag conversion (to UD), excluding fields and renaming fields. These can modify schema but also the data, but are currently grouped together.
exporters - for example, JSONL output and SQL and configuration files for the backend
installers - for example, install resource in an instance of the Karp-S backend

The main commands that can be invoked are:

~~prepare~~ - read the data and infer schema and output configuration files (importers, modifiers)
run - do the needed modifications to each entry and output data in new formats (modifiers, exporters)
install - runs commands and move files, such as adding data to a database, running a command in another tool etc. (installers)

Note: prepare is not implemented as a separate step yet, but the tasks are done when calling run.

The pipeline aims to do the following:

Never save all entries in memory, making it possible to run large datasets
First pass of the data: infer the schema and order of fields
Second pass of the data: run modifiers and exporters
Installers do not read source data

Future work

Dependencies - modifiers may need to be run in a specific order to work
Plugin system

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
src/karpspipeline		src/karpspipeline
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Karp-S pipeline

Documentation

Future work

About

Uh oh!

Releases

Packages

Uh oh!

Languages

spraakbanken/karp-s-pipeline

Folders and files

Latest commit

History

Repository files navigation

Karp-S pipeline

Documentation

Future work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages