Natural Language TODO List

This is a short list of projects that are "ready to go" but have not been started yet.

Tokenization

Best results are obtainable with the "freedom models" (freedom at character transitions) as described in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655800/

Morphology

Most European languages have conjugated verbs, meaning that there is a verb stem, and a varying suffix indicating tense and number. Effectively all syntactic structure is carried by the suffix, whereas fundamental semantic contents is in the stem.

To deal with morophology, words need to

Chinese

Chinese segmentation can be learned, in the sense of "set phrases".

Translation/parallel tests

This too should work.

Infrastructure dev is needed for parallel texts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!