lexpart

Companion code for "Toward a Thermodynamics of Meaning," CHR 2020
Official: http://ceur-ws.org/Vol-2723/short40.pdf
arXiv: https://arxiv.org/abs/2009.11963

This contains a simple reference implementation of a lingusitic partition function as described in the paper, with some limited documentation.

Installation

The repository is pip-installable:

pip install git+https://github.com/senderle/lexpart#egg=lexpart

Usage Example

To train an embedding based on the included test dataset (enwiki8), run the following commands:

python -m lexpart vocab vocab.npz -
python -m lexpart corpus corpus.npz vocab.npz -
python -m lexpart embed embed.npz corpus.npz
python -m lexpart wordsim embed.npz paris

This will print out a list of words in the corpus that are similar to "paris."

To train an embedding based on your own corpus, replace the - in the above commands with the path to a folder containing plain text files.

Mathematical Fine Print

The model described in the paper is based on the grand canonical partition function for multiple species in its standard form:

Z = ∑_i e^{β(µ₁N_1,i + µ₂N_2,i + ... + µ_kN_k,i − E_i)}

For computational purposes, however, it's convenient to represent the partition function in another form. Substituting u_k for e^βμ_k, we can rewrite the above like so:

Z = ∑_i u₁^N_1,i u₂^N_2,i ... u_k^N_k,i e^−βE_i

If we cheat a bit by treating the energy term (e^−βE_i) as a constant for all i, we can treat the partition function as one huge polynomial. Each term in the polynomial represents a sentence as a bag of words, where the exponent is the word count. Since counts for sentences are sparse, and differentiation is a linear operator, we can calculate values for the Jacobian and Hessian very efficiently. The code that performs this calculation is in sparsehess.py.

There are some interesting connections between this way of thinking about sentences and contexts in natural language and the way of thinking about data types described in Conor McBride's "The Derivative of a Regular Type is its Type of One-Hole Contexts."

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
lexpart		lexpart
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lexpart

Installation

Usage Example

Mathematical Fine Print

About

Releases 1

Packages

Languages

License

senderle/lexpart

Folders and files

Latest commit

History

Repository files navigation

lexpart

Installation

Usage Example

Mathematical Fine Print

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages