Unicode And UTF-8

type
flashcards
status
draft
id
flashcards.unicode-and-utf-8

Flashcards: Unicode And UTF-8

1 What is ASCII?
An older coded character set for a limited set of characters and control codes.
2 What is Unicode?
A universal character encoding standard for modern text processing, storage, and interchange.
3 What is a Unicode code point?
A value in the Unicode code space, written like U+0041.
4 What is UTF-8?
The byte-oriented encoding form of Unicode.
5 Is Unicode the same as UTF-8?
No. Unicode is the standard; UTF-8 is an encoding form.
6 Is modern Unicode simply 16-bit?
No. Modern Unicode code points range from U+0000 to U+10FFFF.
7 Why does ASCII still matter for UTF-8?
UTF-8 preserves ASCII byte values for ASCII characters.
8 How many bytes can UTF-8 use per code point?
1 to 4 bytes.
9 What is a code unit in UTF-8?
An 8-bit unit, effectively a byte.
10 Why can byte length differ from character count?
Some Unicode code points require multiple UTF-8 bytes.
11 What is U+0041?
The Unicode code point for A.
12 What are UTFs?
Algorithmic mappings from Unicode code points to byte sequences and back.
13 What does invalid UTF-8 mean?
A byte sequence that does not follow UTF-8’s well-formed sequence rules.
14 Is Unicode a font?
No. Fonts draw glyphs; Unicode encodes characters.
15 Why might a valid Unicode character fail to display?
Missing font, OS support, app support, or language/script support.
16 What beginner rule should you retire after learning UTF-8?
One character equals one byte.
17 What later rule is even more precise?
One visible character may not equal one code point.
18 What does this topic add to from-bits-to-meaning?
Modern text encoding: characters to code points to UTF-8 bytes.
source anchors

Source Anchors

  • unicode-standard-about
    • “About the Unicode Standard”, Characters for the World.
  • unicode-faq-basic-questions
    • Q: What is Unicode?
    • Q: What is the scope of Unicode?
    • Q: Where can I purchase the Unicode software or the Unicode font?
    • Q: My computer cannot display some of the latest Unicode symbols...
  • unicode-faq-utf-bom
    • Q: Is Unicode a 16-bit encoding?
    • Q: What is a UTF?
    • Q: What are some of the differences between the UTFs?
    • Q: Is there a standard method to package a Unicode character so it fits an 8-Bit ASCII stream?
    • Q: What is the definition of UTF-8?
  • petzold-code-hidden-language-computer-hardware-software-2e
    • Chapter 20. ASCII and a Cast of Characters, near page 271 through page 285.

Open Questions