Flashcards: Unicode And UTF-8
1 What is ASCII?
An older coded character set for a limited set of characters and control codes.
2 What is Unicode?
A universal character encoding standard for modern text processing, storage, and interchange.
3 What is a Unicode code point?
A value in the Unicode code space, written like
U+0041.4 What is UTF-8?
The byte-oriented encoding form of Unicode.
5 Is Unicode the same as UTF-8?
No. Unicode is the standard; UTF-8 is an encoding form.
6 Is modern Unicode simply 16-bit?
No. Modern Unicode code points range from
U+0000 to U+10FFFF.7 Why does ASCII still matter for UTF-8?
UTF-8 preserves ASCII byte values for ASCII characters.
8 How many bytes can UTF-8 use per code point?
1 to 4 bytes.
9 What is a code unit in UTF-8?
An 8-bit unit, effectively a byte.
10 Why can byte length differ from character count?
Some Unicode code points require multiple UTF-8 bytes.
11
What is U+0041?
The Unicode code point for
A.12 What are UTFs?
Algorithmic mappings from Unicode code points to byte sequences and back.
13 What does invalid UTF-8 mean?
A byte sequence that does not follow UTF-8’s well-formed sequence rules.
14 Is Unicode a font?
No. Fonts draw glyphs; Unicode encodes characters.
15 Why might a valid Unicode character fail to display?
Missing font, OS support, app support, or language/script support.
16 What beginner rule should you retire after learning UTF-8?
One character equals one byte.
17 What later rule is even more precise?
One visible character may not equal one code point.
18
What does this topic add to from-bits-to-meaning?
Modern text encoding: characters to code points to UTF-8 bytes.
source anchors
Source Anchors
unicode-standard-about- “About the Unicode Standard”,
Characters for the World.
- “About the Unicode Standard”,
unicode-faq-basic-questionsQ: What is Unicode?Q: What is the scope of Unicode?Q: Where can I purchase the Unicode software or the Unicode font?Q: My computer cannot display some of the latest Unicode symbols...
unicode-faq-utf-bomQ: Is Unicode a 16-bit encoding?Q: What is a UTF?Q: What are some of the differences between the UTFs?Q: Is there a standard method to package a Unicode character so it fits an 8-Bit ASCII stream?Q: What is the definition of UTF-8?
petzold-code-hidden-language-computer-hardware-software-2eChapter 20. ASCII and a Cast of Characters, near page 271 through page 285.
Open Questions
- Add separate cards later for grapheme clusters, normalization, bidirectional text, and script shaping.