Kanji Heatmap Data

Usage

Requirements

Python 3.x:

python --version
# Python 3.13.2

Overrides

Edit the following files to specify the values you want to override

kanji_parts.json
kanji_vocab.json
keywords.json

Input Files

Download aggregated kanji information from Kanji Data Releases

curl --output-dir input -OL https://github.com/PikaPikaGems/kanji-data-releases/releases/latest/download/kanji-data.tar.gz
tar -xzf ./input/kanji-data.tar.gz -C ./input/

Download the map of vocabulary to its components from JMdict Furigana Map

curl --output-dir input -OL https://github.com/PikaPikaGems/jmdict-furigana-map/releases/latest/download/jmdict-furigana-map.json.tar.gz
tar -xzf ./input/jmdict-furigana-map.json.tar.gz -C ./input/

Download and prepare the the Simplified JMdict JSON file from Jmdict Simplified

# if all words
curl --output-dir input -OL https://github.com/scriptin/jmdict-simplified/releases/download/3.6.1%2B20250324123350/jmdict-eng-3.6.1+20250324123350.json.tgz
tar -xzf ./input/jmdict-eng-3.6.1+20250324123350.json.tgz -C ./input/
mv input/jmdict-eng-3.6.1.json input/scriptin-jmdict-eng.json

# If common words only
curl --output-dir input -OL https://github.com/scriptin/jmdict-simplified/releases/download/3.6.1%2B20250324123350/jmdict-eng-common-3.6.1+20250324123350.json.tgz
tar -xzf ./input/jmdict-eng-common-3.6.1+20250324123350.json.tgz -C ./input/
mv input/jmdict-eng-common-3.6.1.json input/scriptin-jmdict-eng.json

Remove the files which you don't need anymore, to reduce clutter

rm ./input/kanji-data.tar.gz
rm ./input/jmdict-furigana-map.json.tar.gz

# depending on what you chose
rm ./input/jmdict-eng-common-3.6.1+20250324123350.json.tgz
rm ./input/jmdict-eng-3.6.1+20250324123350.json.tgz

This leave the input directory with the following files:

cum_use.json
jmdict-furigana-map.json # From: JMdict Furigana Map
kanji_vocab.json
merged_kanji.json
missing_components.json
phonetic_components.json
scriptin-jmdict-eng.json # From: Jmdict Simplified
vocab_furigana.json
vocab_meaning.json

Transform Data

./src/kanji_build_output_jsons.py

The following output files should be generated in the output directory:

component_keyword.json
cum_use.json
kanji_extended.json
kanji_main.json
phonetic.json
vocabulary_meaning.json
vocabulary_furigana.json

Additionally the following files will be created by running the script above inthe input directory. This will not be part of the release file.

jmdict-vocab-meaning.json

Inspect Data

./src/kanji_inspect.py

Prepare release

See RELEASE.md

License and Credits

The software is distributed under the MIT License.

The input data comes from:

Dmitry Shpika's jmdict-simplified which project uses the JMdict/EDICT file, which is the property of the Electronic Dictionary Research and Development Group (https://www.edrdg.org/), and used in conformance with the Group's license.
Kanji Data Releases and JMdict Furigana Map, both under CC BY-SA 4.0.

JMdict and JMnedict

The original XML files - JMdict.xml, JMdict_e.xml, JMdict_e_examp.xml,and JMnedict.xml - are the property of the Electronic Dictionary Research and Development Group, and are used in conformance with the Group's license. All derived files are distributed under the same license, as the original license requires it.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.vscode		.vscode
input		input
output		output
overrides		overrides
releases		releases
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kanji Heatmap Data

Usage

Requirements

Overrides

Input Files

Transform Data

Inspect Data

Prepare release

License and Credits

JMdict and JMnedict

About

Uh oh!

Releases 8

Packages

Contributors 2

Uh oh!

Languages

License

PikaPikaGems/kanji-heatmap-data

Folders and files

Latest commit

History

Repository files navigation

Kanji Heatmap Data

Usage

Requirements

Overrides

Input Files

Transform Data

Inspect Data

Prepare release

License and Credits

JMdict and JMnedict

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 2

Uh oh!

Languages

Packages