This is a short list of projects that are "ready to go" but have not been started yet.
Best results are obtainable with the "freedom models" (freedom at character transitions) as described in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655800/
Most European languages have conjugated verbs, meaning that there is a verb stem, and a varying suffix indicating tense and number. Effectively all syntactic structure is carried by the suffix, whereas fundamental semantic contents is in the stem.
To deal with morophology, words need to
Chinese segmentation can be learned, in the sense of "set phrases".
This too should work.
Infrastructure dev is needed for parallel texts.