Skip to content

PikaPikaGems/kanji-heatmap-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kanji Heatmap Data

Usage

Requirements

Python 3.x:

python --version
# Python 3.13.2

Overrides

Edit the following files to specify the values you want to override

kanji_parts.json
kanji_vocab.json
keywords.json

Input Files

Download aggregated kanji information from Kanji Data Releases

curl --output-dir input -OL https://github.com/PikaPikaGems/kanji-data-releases/releases/latest/download/kanji-data.tar.gz
tar -xzf ./input/kanji-data.tar.gz -C ./input/

Download the map of vocabulary to its components from JMdict Furigana Map

curl --output-dir input -OL https://github.com/PikaPikaGems/jmdict-furigana-map/releases/latest/download/jmdict-furigana-map.json.tar.gz
tar -xzf ./input/jmdict-furigana-map.json.tar.gz -C ./input/

Download and prepare the the Simplified JMdict JSON file from Jmdict Simplified

# if all words
curl --output-dir input -OL https://github.com/scriptin/jmdict-simplified/releases/download/3.6.1%2B20250324123350/jmdict-eng-3.6.1+20250324123350.json.tgz
tar -xzf ./input/jmdict-eng-3.6.1+20250324123350.json.tgz -C ./input/
mv input/jmdict-eng-3.6.1.json input/scriptin-jmdict-eng.json

# If common words only
curl --output-dir input -OL https://github.com/scriptin/jmdict-simplified/releases/download/3.6.1%2B20250324123350/jmdict-eng-common-3.6.1+20250324123350.json.tgz
tar -xzf ./input/jmdict-eng-common-3.6.1+20250324123350.json.tgz -C ./input/
mv input/jmdict-eng-common-3.6.1.json input/scriptin-jmdict-eng.json

Remove the files which you don't need anymore, to reduce clutter

rm ./input/kanji-data.tar.gz
rm ./input/jmdict-furigana-map.json.tar.gz

# depending on what you chose
rm ./input/jmdict-eng-common-3.6.1+20250324123350.json.tgz
rm ./input/jmdict-eng-3.6.1+20250324123350.json.tgz

This leave the input directory with the following files:

cum_use.json
jmdict-furigana-map.json # From: JMdict Furigana Map
kanji_vocab.json
merged_kanji.json
missing_components.json
phonetic_components.json
scriptin-jmdict-eng.json # From: Jmdict Simplified
vocab_furigana.json
vocab_meaning.json

Transform Data

./src/kanji_build_output_jsons.py

The following output files should be generated in the output directory:

  • component_keyword.json
  • cum_use.json
  • kanji_extended.json
  • kanji_main.json
  • phonetic.json
  • vocabulary_meaning.json
  • vocabulary_furigana.json

Additionally the following files will be created by running the script above inthe input directory. This will not be part of the release file.

jmdict-vocab-meaning.json

Inspect Data

./src/kanji_inspect.py

Prepare release

See RELEASE.md

License and Credits

The software is distributed under the MIT License.

The input data comes from:

  1. Dmitry Shpika's jmdict-simplified which project uses the JMdict/EDICT file, which is the property of the Electronic Dictionary Research and Development Group (https://www.edrdg.org/), and used in conformance with the Group's license.
  2. Kanji Data Releases and JMdict Furigana Map, both under CC BY-SA 4.0.

JMdict and JMnedict

The original XML files - JMdict.xml, JMdict_e.xml, JMdict_e_examp.xml,and JMnedict.xml - are the property of the Electronic Dictionary Research and Development Group, and are used in conformance with the Group's license. All derived files are distributed under the same license, as the original license requires it.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages