| path | title |
|---|---|
/learnings/javascript_unicode_everything_you_never_wanted_to_know |
Learnings: Javascript: Unicode Everything you never wanted to know |
Javascript natively uses UCS-2 exposing but likely UTF-16 character encoding at a language level. Which means it runs out of characters after the 65k range, which means high order characters are done by looking at the second charcter to see more.
SO: if you are looking up the number, use codePointAt() instead of charCodeAt(), as the former natively supports this higher order character work.
Technically this likely means that if you are typing to get .length of a string with a high unicode character, then it might not match what you expected (ie one character) because JS "followed the spec and gave you UCS-2 semantics". Source
First, read javascript has a unicode problem.
Could implement it like so:
function countSymbols(string) {
return Array.from(string).length;
}
Key words:
-
color modifiers
-
zero width joiners
-
https://eng.getwisdom.io/emoji-modifiers-and-sequence-combinations/