Skip to content

2.0.0-rc3

Pre-release
Pre-release
Compare
Choose a tag to compare
@eiennohito eiennohito released this 19 Aug 01:56
· 61 commits to master since this release

Juman++ Version: 2.0.0-rc3 / Dictionary: 20190731-356e143 / LM: K:20190430-7d143fb L:20181122-b409be68 F:20171214-9d125cb

What's new

  • WARNING: models are not compatible with binaries of previous versions. On the other hand, they are compatible with the master branch now.
  • Check that statically-generated inference code uses compatible model
  • Protobuf-based output formats (optional, requires protobuf 3.0+ installed)
  • Use https://github.com/s-yata/darts-clone as trie implementation, trie index size is 2 times smaller now
  • Can now write definitions for models using using text files, not just C++ DSL

Jumandic-specific

  • Escape bad characters for JUMAN/lattice output formats
  • Fix kaomoji problem breaking brackets (#97)
  • Corpus fixes
  • Analysis fixes by partial annotations
  • Added reading field to aliasing set (but don't trust the reading results in analysis very much, our corpora are not clean for those annotations)

JUMAN output format now escapes following characters: <>" , and tab character. Tabs are replaced by \t, other characters are replaced by their full-width counterparts.
For the replaced characters we output 元半角 tag in the feature field.
Lattice output format escapes only tabs. Protobuf output formats don't escape anything.

Example:

スペース が好きだ
スペース すぺーす スペース 名詞 6 普通名詞 1 * 0 * 0 "代表表記:スペース/すぺーす カテゴリ:場所-その他"
      特殊 1 空白 6 * 0 * 0 "代表表記:S/* 元半角"
が が が 助詞 9 格助詞 1 * 0 * 0 NIL
好きだ すきだ 好きだ 形容詞 3 * 0 ナ形容詞 21 基本形 2 "代表表記:好きだ/すきだ 反義:形容詞:嫌いだ/きらいだ 動詞派生:好く/すく"

When is the final release?

  • We need to clean up training corpora somewhat and update our RNN model