-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added micro_blog #3268
base: main
Are you sure you want to change the base?
added micro_blog #3268
Conversation
Are you still working on this @meatball133? I would be happy to help if you want any. |
Hi @vaeng 👋🏽 Thank you for your interest. 😄 The open PRs here are drafts of work that have been pre-agreed. @meatball133 and I are still working through them. Overall, wider community contributions have been paused for this track until at least May/June. But if you have issues or proposals, we will be happy to discuss them in the exercism forum. |
|
||
- **ASCII** can encode English language characters. | ||
All characters are precisely 1 byte long. | ||
- **UTF-8** is a Unicode text encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **UTF-8** is a Unicode text encoding. | |
- **UTF-8** is a variable-length Unicode text encoding. |
All characters are precisely 1 byte long. | ||
- **UTF-8** is a Unicode text encoding. | ||
Characters take between 1 and 4 bytes. | ||
- **UTF-16** is a Unicode text encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **UTF-16** is a Unicode text encoding. | |
- **UTF-16** is also a variable-length Unicode text encoding. |
- **UTF-16** is a Unicode text encoding. | ||
Characters are either 2 or 4 bytes long. | ||
|
||
UTF-8 and UTF-16 are both Unicode encodings which means they're capable of representing a massive range of characters including: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UTF-8 and UTF-16 are both Unicode encodings which means they're capable of representing a massive range of characters including: | |
UTF-8 and UTF-16 are both capable of representing a massive range of reader-perceived 'characters' or [graphemes][grapheme] including: |
Consider the letter 'a' and the emoji '😛'. | ||
In UTF-16 the letter takes 2 bytes but the emoji takes 4 bytes. | ||
|
||
The trick to this exercise is to use APIs designed around Unicode characters (codepoints) instead of Unicode codeunits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trick to this exercise is to use APIs designed around Unicode characters (codepoints) instead of Unicode codeunits. | |
The trick to this exercise is to use APIs designed around Unicode characters (codepoints) instead of Unicode codeunits. | |
[grapheme]: https://dictionary.cambridge.org/us/dictionary/english/grapheme |
|
||
- Text in most of the world's languages and scripts | ||
- Historic text | ||
- Emoji |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Emoji | |
- Emoji | |
- Symbols used in Physics and Mathematics |
- Historic text | ||
- Emoji | ||
|
||
UTF-8 and UTF-16 are both variable length encodings, which means that different characters take up different amounts of space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UTF-8 and UTF-16 are both variable length encodings, which means that different characters take up different amounts of space. | |
UTF-8 and UTF-16 are both variable length encodings, which means that different graphemes can take up different amounts of space. |
@@ -0,0 +1,19 @@ | |||
{ | |||
"blurb": "Given an input string, truncate it to 5 characters.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"blurb": "Given an input string, truncate it to 5 characters.", | |
"blurb": "Given a Unicode input string, truncate it to 5 grapheme clusters.", |
No description provided.