Skip to content

Character counts in the concatenated USFM files for the 73 Bible books #17

@DavidHaslam

Description

@DavidHaslam

The attached tab delimited text file may be a useful analysis:

merged.usfm.character.frequency.txt

Observe the difference in counts for characters that are usually in pairs.

U+0028	(	5,803	LEFT PARENTHESIS
U+0029	)	5,800	RIGHT PARENTHESIS

U+2018	‘	3,771	LEFT SINGLE QUOTATION MARK
U+2019	’	5,085	RIGHT SINGLE QUOTATION MARK

U+201C	“	6,206	LEFT DOUBLE QUOTATION MARK
U+201D	”	6,188	RIGHT DOUBLE QUOTATION MARK

This indicates that there may be some unpaired characters, which is often worth checking.

The right single quotation mark is also used as the typographical apostrophe, which helps explain the large difference observed.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions