UNICODE PLUS

Unicode Plus is a set of Unicode, Unihan & emoji utilities wrapped into one single app, built with Electron.

This app works on Mac OS X, Linux and Windows operating systems.

Utilities

The following utilities are currently available:

CJK Font Variants
JavaScript Runner
Regex Properties
Emoji Data Finder
Emoji Picture Book
Emoji References
Unicode Data Finder
Unicode Inspector
Unicode References
Unihan Data Finder
Unihan Inspector
Unihan References

CJK Font Variants

The CJK Font Variants utility displays simultaneously any string of CJK (Chinese/Japanese/Korean) characters in several different typefaces:
- Japanese (JP)
- Korean (KR)
- Simplified Chinese (SC)
- Traditional Chinese (TC)
- Hong Kong Chinese (HK)
The typefaces belong to the open-source set of Google Noto CJK Fonts:
- Noto Sans CJK JP Regular
- Noto Sans CJK KR Regular
- Noto Sans CJK SC Regular
- Noto Sans CJK TC Regular
- Noto Sans CJK HK Regular
Additionally, it is possible to specify a set of logographic glyph variants for display by using the East Asian Variant drop-down menu.
CJK characters can be entered either directly in the "Characters" input field, or using a series of code points in hexadecimal format in the "Code Points" input field.
It is also possible to input predefined strings of CJK characters selected from the Samples ▾ pop-up menu; some of them make use of the information found in the StandardizedVariants.txt or IVD_Sequences.txt data files.
As a convenience, the input fields can be emptied using the Clear button.
In output, the standard Unicode code point format U+7ADC is used, i.e. "U+" directly followed by 4 or 5 hex digits.
In input, more hexadecimal formats are allowed, including Unicode escape sequences, such as \u9F8D or \u{20B9F}. Moving out of the field or typing the Enter key converts all valid codes to standard Unicode code point format.

JavaScript Runner

The JavaScript Runner utility lets you execute JavaScript code, and comes with several sample scripts related to Unicode, Unihan and emoji; it is useful for quick testing/prototyping or data processing.

Regex Properties

The Regex Properties utility displays all the Unicode properties available in this app for regular expressions, used in particular by the Emoji Data Finder, Unicode Data Finder and Unihan Data Finder utilities.
These properties are suitable to build Unicode-aware regular expressions in JavaScript (ECMAScript 6) using the 'u' flag.
Unicode properties fall into four groups, which can be displayed individually using the Category drop-down menu:
- General Category properties
- Binary properties
- Script properties
- Script Extensions properties
For General Category properties, prefixing with General_Category= (Canonical) or gc= (Alias) is optional. Use the Optional Prefix checkbox to control whether the prefix is included or not.

Groupings:

Property	Description
Cased_Letter	Uppercase_Letter \| Lowercase_Letter \| Titlecase_Letter
Letter	Uppercase_Letter \| Lowercase_Letter \| Titlecase_Letter \| Modifier_Letter \| Other_Letter
Mark	Nonspacing_Mark \| Spacing_Mark \| Enclosing_Mark
Number	Decimal_Number \| Letter_Number \| Other_Number
Punctuation	Connector_Punctuation \| Dash_Punctuation \| Open_Punctuation \| Close_Punctuation \| Initial_Punctuation \| Final_Punctuation \| Other_Punctuation
Symbol	Math_Symbol \| Currency_Symbol \| Modifier_Symbol \| Other_Symbol
Separator	Space_Separator \| Line_Separator \| Paragraph_Separator
Other	Control \| Format \| Surrogate \| Private_Use \| Unassigned

\P{…} is the negated form of \p{…}. Use the Negated checkbox to toggle between the two forms.
Notes:
- \p{Any} is equivalent to [\u{0}-\u{10FFFF}]
- \p{ASCII} is equivalent to [\u{0}-\u{7F}]
- \p{Assigned} is equivalent to \P{Unassigned}
Information pertaining to this list has been gathered from several sources (see References), and slightly refined through trial and error.

Emoji Data Finder

Find by Name

The Find by Name feature of the Emoji Data Finder utility displays a list of basic data (symbol, short name, keywords, code) of matching Unicode emoji searched by name or keyword, including through regular expressions.
After entering a query, click on the Search button to display a list of all relevant matches, if any.
Fully-qualified (keyboard/palette) emoji are presented in a standard way, while non-fully-qualified (display/process) emoji are shown in a distinctive muted (grayed out) style.
This feature deals with the 3,570 emoji defined in the Emoji 11.0 version of the emoji-test.txt data file; the 12 keycap bases and the 26 singleton Regional Indicator characters are not included.
Click on the ✕ button to clear all results.
Various examples of regular expressions are provided for quick copy-and-paste.

Match Symbol

The Match Symbol feature of the Emoji Data Finder utility displays a list of basic data (symbol, short name, keywords, code) of Unicode emoji matching a symbol, or a regular expression using Unicode properties.
After entering a query, click on the Search button to display a list of all relevant matches, if any.
Fully-qualified (keyboard/palette) emoji are presented in a standard way, while non-fully-qualified (display/process) emoji are shown in a distinctive muted (grayed out) style.
This feature deals with the 3,570 emoji defined in the Emoji 11.0 version of the emoji-test.txt data file; the 12 keycap bases and the 26 singleton Regional Indicator characters are not included.
Click on the ✕ button to clear all results.
Various examples of regular expressions are provided for quick copy-and-paste.

Filter Text

The Filter Text feature of the Emoji Data Finder utility displays in real time a list of basic data (symbol, short name, keywords, code) of all the Unicode emoji contained in a text string.
Text can by directly typed or pasted from the clipboard into the main input field. Click on the Filter button to strip out all non-emoji characters.
It is also possible to input predefined sets of emoji selected from the Samples ▾ pop-up menu.
As a convenience, the input field can be emptied using the Clear button.
Fully-qualified (keyboard/palette) emoji are presented in a standard way, while non-fully-qualified (display/process) emoji are shown in a distinctive muted (grayed out) style.
This feature deals with the 3,570 emoji defined in the Emoji 11.0 version of the emoji-test.txt data file; the 12 keycap bases and the 26 singleton Regional Indicator characters are not included.

Emoji Picture Book

The Emoji Picture Book utility displays lists of Unicode emoji in a color picture book fashion.
Any group of pictures can be displayed by selecting its name in the Category drop-down menu, among:
"Smileys & People", "Animals & Nature", "Food & Drink", "Travel & Places", "Activities", "Objects", "Symbols", "Flags".
The size of all emoji pictures (from 32 to 128 pixels) can be adjusted by moving the dedicated slider left and right.
The groups and subgroups of emoji are those defined in the Emoji 11.0 version of the emoji-test.txt data file; the 12 keycap bases and the 26 singleton Regional Indicator characters are not included.
Only the 2789 fully-qualified (keyboard/palette) encodings of the emoji are used unless they cannot be displayed properly, depending on the emoji support level of the operating system.
Emoji failing to be represented as proper color pictures are purely and simply discarded.

Emoji References

The Emoji References utility provides a list of reference links to emoji-related web pages.

Unicode Data Finder

Find by Name

The Find by Name feature of the Unicode Data Finder utility displays a list of basic data (symbol, code point, name, block) of matching Unicode characters searched by name (or alias name), including through regular expressions.
After entering a query, click on the Search button to display a list of all relevant matches, if any, ordered by code point value.
When available, name aliases are also displayed (in smaller typeface) after the unique and immutable Unicode name. A correction alias is indicated by a leading reference mark ※.
It is possible to choose how many characters are shown one page at a time.
The search is performed on the 276,955 assigned characters (or code points) defined in the Unicode 11.0 version of the UnicodeData.txt data file.
Click on the ✕ button to clear all results.
Various examples of regular expressions are provided for quick copy-and-paste.

Match Symbol

The Match Symbol feature of the Unicode Data Finder utility displays a list of basic data (symbol, code point, name, block) of Unicode characters matching a symbol, or a regular expression using Unicode properties.
After entering a query, click on the Search button to display a list of all relevant matches, if any, ordered by code point value.
It is possible to choose how many characters are shown one page at a time.
The search is performed on the 276,955 assigned characters (or code points) defined in the Unicode 11.0 version of the UnicodeData.txt data file.
Click on the ✕ button to clear all results.
Various examples of regular expressions are provided for quick copy-and-paste.

List by Block

The List by Block feature of the Unicode Data Finder utility displays in real time a list of basic data (symbol, code point, name, block) of Unicode characters belonging to the same block range.
It is possible to choose how many characters are shown one page at a time.
A block can be selected either by Block Range or by Block Name, as defined in the Unicode 11.0 version of the Blocks.txt data file.
It is also possible to directly enter a code point (or character) in the Specimen field, then click on the Go button to automatically select the block containing the code point, scroll its basic data into view, and highlight its hexadecimal code value.
You can quickly reuse a previously entered code point by using the Alt+↑ and Alt+↓ keyboard shortcuts to navigate up and down through the history stack in the Specimen field.

Unicode Inspector

The Unicode Inspector utility displays code point information in real time for each Unicode character of a text string.
Characters can be entered either directly in the "Characters" input field, or using a series of code points in hexadecimal format in the "Code Points" input field.
It is also possible to input predefined sets of characters selected from the Samples ▾ pop-up menu.
As a convenience, the input fields can be emptied using the Clear button.
In output, the standard Unicode code point format U+0041 is used, i.e. "U+" directly followed by 4 or 5 hex digits.
In input, more hexadecimal formats are allowed, including Unicode escape sequences, such as \u611B or \u{1F49C}. Moving out of the field or typing the Enter key converts all valid codes to standard Unicode code point format.
Information is provided for the 276,955 assigned characters (or code points) defined in the Unicode 11.0 version of the UnicodeData.txt data file.
Extra information is also obtained from the following data files:

Unicode References

The Unicode References utility provides a list of reference links to Unicode-related web pages.

Unihan Data Finder

Find by Tag Value

The Find by Tag Value feature of the Unihan Data Finder utility displays a list of basic data (symbol, code point, Unihan tag, value, block) of matching Unihan characters searched by tag value, including through regular expressions.
Use the Unihan Tag drop-down menu to select the tag you wish to search value by.
Use the Categories checkbox to toggle between: all Unihan tags ordered alphabetically, or grouped by categories in the drop-down menu.
After entering a query, click on the Search button to display a list of all relevant matches, if any, ordered by code point value.
It is possible to choose how many characters are shown one page at a time.
The search is performed on the 88,889 Unihan characters (or code points) defined in the set of data files contained in the Unihan.zip archive file:
- Unihan_DictionaryIndices.txt
- Unihan_DictionaryLikeData.txt
- Unihan_IRGSources.txt
- Unihan_NumericValues.txt
- Unihan_OtherMappings.txt
- Unihan_RadicalStrokeCounts.txt
- Unihan_Readings.txt
- Unihan_Variants.txt
Click on the ✕ button to clear all results.
Various examples of regular expressions are provided for quick copy-and-paste.

Radical/Strokes

The Radical/Strokes feature of the Unihan Data Finder utility displays all the Unihan characters searched by KangXi radical and additional stroke count.
Use the Unihan Full Set checkbox to perform the search on the full set of 88,889 Unihan characters, or limit it to the IICore set of 9,810 CJK unified ideographs in common usage.
Use the Allow Extra Sources checkbox to extend the search to all radical/strokes source tags, or use only the IRG-defined source tag common to all Unihan characters.
Use the Radical and Strokes drop-down menus to select the KangXi radical and the additional stroke count of the Unihan characters you are looking for, then click on the Search button.
Selecting All from the Strokes menu lets you display all the Unihan characters sharing the same KangXi radical, sorted by additional stroke count.
Click on the ✕ button to clear all results.
A complete list of the 214 KangXi radicals is available for reference, showing also CJK variants as well as simplified forms.

View by Grid

The View by Grid feature of the Unihan Data Finder utility displays in real time a grid view of the blocks containing the 88,889 Unihan characters.
It is possible to choose how many characters are shown one page at a time.
A block can be selected either by Block Name or by Block Range.
It is also possible to directly enter a Unihan character or code point in the Specimen field, then click on the Go button to automatically select the block containing the character, scroll it into view, and highlight it.
You can quickly reuse a previously entered Unihan character by using the Alt+↑ and Alt+↓ keyboard shortcuts to navigate up and down through the history stack in the Specimen field.
A list of all the Unihan blocks is available for quick reference.

Unihan Inspector

The Unihan Inspector utility displays all available Unihan tags for each of the 88,889 Unihan characters defined in the set of data files contained in the Unihan.zip archive file:
- Unihan_DictionaryIndices.txt
- Unihan_DictionaryLikeData.txt
- Unihan_IRGSources.txt
- Unihan_NumericValues.txt
- Unihan_OtherMappings.txt
- Unihan_RadicalStrokeCounts.txt
- Unihan_Readings.txt
- Unihan_Variants.txt
Any Unihan character can be entered in the Unihan input field either as a character or a code point. Click on the Lookup button to display the list of Unihan tags.
In addition, the utility provides, for each Unihan character:
- basic Unicode information: name, age, plane, block, script, script extensions, general category, decomposition, binary properties, equivalent unified ideograph;
- basic Unihan information: radical/strokes, definition, variant characters, IICore set.
Previously looked up Unihan characters are kept in a history stack; use the Alt+↑ and Alt+↓ keyboard shortcuts to navigate through them up and down inside the input field.
It is also possible to lookup a randomly selected Unihan character by clicking on the Random button; use the Full Set checkbox to perform the draw on the full set of 88,889 Unihan characters, or restrict it to the IICore set of 9,810 CJK unified ideographs in common usage.
The currently looked up Unihan character is displayed at a large scale, followed by its code point; click on ◀ or ▶ to step through several different CJK typefaces, among: JP (Japanese), KR (Korean), SC (Simplified Chinese), TC (Traditional Chinese), HK (Hong Kong Chinese). Double-click on the two-letter language tag to toggle between these five CJK typefaces and the system default typeface.
Use the Categories checkbox to toggle between: all Unihan tags ordered alphabetically, or grouped by categories.
Notes:
- The top Radical/Strokes fields are displaying data obtained from the only informative IRG Source: kRSUnicode, while the bottom ones (in grayed-out style, if any) make use of the provisional sources: kRSKangXi, kRSJapanese, kRSKanWa, kRSKorean and kRSAdobe_Japan1_6.
- IICore (International Ideographs Core) represents a set of 9,810 important Unihan characters in everyday use throughout East Asia; it has been developed by the IRG.
- IRG stands for Ideographic Rapporteur Group, a committee advising the Unicode Consortium about Asian language characters.
- The Yasuoka Variants information is drawn from the "Variants table for Unicode" data file UniVariants.txt provided by Prof. Kōichi Yasuoka.

Unihan References

The Unihan References utility provides a list of reference links to Unihan-related web pages.

Building

You'll need Node.js installed on your computer in order to build this app.

git clone https://github.com/tonton-pixel/unicode-plus
cd unicode-plus
npm install
npm start

If you don't wish to clone, you can download the source code.

Several scripts are also defined in the package.json file to build OS-specific bundles of the app, using the simple yet powerful Electron Packager Node module.
For instance, running the following command will create a Unicode Plus.app version for Mac OS X:

npm run build-darwin

Using

You can download the latest release for Mac OS X.

License

The MIT License (MIT).

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
icons		icons
lib		lib
renderer		renderer
screenshots		screenshots
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
main.js		main.js
package-lock.json		package-lock.json
package.json		package.json
settings.json		settings.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UNICODE PLUS

Utilities

CJK Font Variants

JavaScript Runner

Regex Properties

Emoji Data Finder

Find by Name

Match Symbol

Filter Text

Emoji Picture Book

Emoji References

Unicode Data Finder

Find by Name

Match Symbol

List by Block

Unicode Inspector

Unicode References

Unihan Data Finder

Find by Tag Value

Radical/Strokes

View by Grid

Unihan Inspector

Unihan References

Building

Using

License

About

Uh oh!

Releases

Packages

Languages

License

productinfo/unicode-plus

Folders and files

Latest commit

History

Repository files navigation

UNICODE PLUS

Utilities

CJK Font Variants

JavaScript Runner

Regex Properties

Emoji Data Finder

Find by Name

Match Symbol

Filter Text

Emoji Picture Book

Emoji References

Unicode Data Finder

Find by Name

Match Symbol

List by Block

Unicode Inspector

Unicode References

Unihan Data Finder

Find by Tag Value

Radical/Strokes

View by Grid

Unihan Inspector

Unihan References

Building

Using

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages