kalidasa

kalidasa is a repository of the data from the Digital Corpus of Sanskrit, designed for programmatic text analysis.

Important

This is a work in progress.

Installing

Currently you should use the devtools pacakge to install kalidasa. When I sort out the final organization of the datasets I plan to submit to CRAN.

# install.pacakges("devtools")
devtools::install_github("mghaight/kalidasa")

The Project

kalidasa includes three datasets and several helper functions. dcs_meta includes corpus metadata for each text, including full title, author, time period, and subject/genre. dcs_raw includes a list of character vectors for each text, divided by chapter. dcs_rich includes lemmata data and grammatical analysis in a tidy format.

Helper Functions

There are several helper functions to make querying the data easier. Since all of the texts and titles are transliterated according to the IAST schema and encoded as UTF-8 text, kalidasa makes use of unique text_ids to interface with the package data. These text_ids are consistent with the IDs used in the DCS API. The function print_titles to list the available texts and their text_id for easy lookup. search_texts does a fuzzy search for text titles returning the top matches. get_search returns the dcs_raw data for the top result of a query. get_text—aptly—gets the dcs_raw data for a specificed text_id and optional chapter range.

There is also a function remove_stopwords which can be called on dcs_rich or any subelement text to remove rows of data that are included in a custom stoplist. The stopwords were generated according a hybrid approach of TF-IDF scores, manual selection and the method described in this paper. Lastly, there is a function dcs_write which writes all the data to a location on disk and returns a vector of filepaths.

Attribution

All data was scraped from the DCS, which is prepared by Oliver Hellwig and licensed under CC-BY 4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
R		R
data-raw		data-raw
man/figures		man/figures
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

kalidasa

Installing

The Project

Helper Functions

Attribution

About

Uh oh!

Languages

License

mghaight/kalidasa

Folders and files

Latest commit

History

Repository files navigation

kalidasa

Installing

The Project

Helper Functions

Attribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages