GitHub - benkasminbullock/unicode-c: A C library for handling Unicode, UTF-8, surrogate pairs, etc.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
doc		doc
.gitignore		.gitignore
Changes		Changes
Makefile		Makefile
README		README
UTF-8-test.txt		UTF-8-test.txt
build.pl		build.pl
c-tap-test.h		c-tap-test.h
makefile-copy		makefile-copy
makeitfile		makeitfile
unicode-character-class.c		unicode-character-class.c
unicode-character-class.h		unicode-character-class.h
unicode.c		unicode.c
unicode.h		unicode.h
utf-8-test-data.c		utf-8-test-data.c
utf-8-test.c		utf-8-test.c
utf-8-test.pl		utf-8-test.pl

Repository files navigation

This is a Unicode library in the C programming language which deals
with conversions to and from the UTF-8 format.

* Author:

Ben Bullock <[email protected]>, <[email protected]>

* Repository:

https://github.com/benkasminbullock/unicode-c

* Licence:

You can use this C code under the BSD three-clause licence, the GNU
General Public Licence, either version 2 or later, or the Perl
artistic licence.

* Version:

There is no version for this, please use the git commit numbers.

* Documentation:

Documentation consists of the comments in the source code, a manual
page in doc/unicode.3, and an HTML version in doc/unicode.html.

The mdoc manual page unicode.3 and the html page doc/unicode.html are
both generated from the comments using Perl scripts in the "doc"
subdirectory of the repository. Running these scripts may require you
to install some Perl modules from CPAN. The following command will
download and install all the necessary modules:

curl -L https://cpanmin.us | perl - App::cpanminus
cpanm C::Tokenize Convert::Moji File::Slurper HTML::Make JSON::Create JSON::Parse List::UtilsBy Template Text::LineNumber

* Conventions

All of the functions except "unicode_code_to_error" return values of
type int32_t (32-bit signed integers). All of the UTF-8 inputs are of
the type "unsigned char".

* Testing:

Compile with -DTEST or use "make test" to run the tests. The tests are
contained in "unicode.c" itself. Please refer to the source
code. Running the tests requires the "prove" utility which is part of
Perl.

* Dependencies

This C code uses the definitions from the standard header file
"stdint.h" to get consistent integer types, and functions from
"string.h" to get lengths of strings, and for testing.

* Bugs:

Either send email or use the github "issues" pages to report bugs.

* Known problems:

** The library uses UCS2 where it should have said UTF-16. There are a
few similar misnamings.

** 0xFF is regarded as a valid UTF-8 first byte by some routines.

* Online version

There is an online web version of this software here:

http://www.lemoda.net/tools/uniconvert/index.html