Diagram: Unicode And UTF-8
Conceptual Path
flowchart TD
A["Human text idea<br/>letter, mark, symbol, punctuation"] --> B["Character identity<br/>what character is this?"]
B --> C["Unicode code point<br/>example: U+0041"]
C --> D["Encoding form<br/>UTF-8, UTF-16, UTF-32"]
D --> E["Byte sequence<br/>example in UTF-8: 41"]
E --> F["Storage or transmission<br/>file, network, database, memory"]
F --> G["Decoder reads bytes<br/>using the expected encoding"]
G --> H["Code points again"]
H --> I["Rendering system<br/>font + layout + display"]
I --> J["Visible text"]
source
flowchart TD
A["Human text idea<br/>letter, mark, symbol, punctuation"] --> B["Character identity<br/>what character is this?"]
B --> C["Unicode code point<br/>example: U+0041"]
C --> D["Encoding form<br/>UTF-8, UTF-16, UTF-32"]
D --> E["Byte sequence<br/>example in UTF-8: 41"]
E --> F["Storage or transmission<br/>file, network, database, memory"]
F --> G["Decoder reads bytes<br/>using the expected encoding"]
G --> H["Code points again"]
H --> I["Rendering system<br/>font + layout + display"]
I --> J["Visible text"]
ASCII And UTF-8
flowchart LR
A["ASCII character A"] --> B["ASCII byte 41h"]
A --> C["Unicode code point U+0041"]
C --> D["UTF-8 byte 41h"]
B -. "same byte for ASCII range" .- D
source
flowchart LR
A["ASCII character A"] --> B["ASCII byte 41h"]
A --> C["Unicode code point U+0041"]
C --> D["UTF-8 byte 41h"]
B -. "same byte for ASCII range" .- D
Non-ASCII Example
flowchart LR
A["Character: e with acute"] --> B["Unicode code point U+00E9"]
B --> C["UTF-8 bytes C3 A9"]
C --> D["Stored as two bytes"]
source
flowchart LR
A["Character: e with acute"] --> B["Unicode code point U+00E9"]
B --> C["UTF-8 bytes C3 A9"]
C --> D["Stored as two bytes"]
Source Anchors
unicode-standard-about- “About the Unicode Standard”,
Characters for the World.
- “About the Unicode Standard”,
unicode-faq-basic-questionsQ: What is Unicode?Q: What is the scope of Unicode?Q: My computer cannot display some of the latest Unicode symbols...
unicode-faq-utf-bomQ: Is Unicode a 16-bit encoding?Q: What is a UTF?Q: Is there a standard method to package a Unicode character so it fits an 8-Bit ASCII stream?Q: Which method of packing Unicode characters into an 8-bit stream is the best?Q: What is the definition of UTF-8?
petzold-code-hidden-language-computer-hardware-software-2eChapter 20. ASCII and a Cast of Characters, near page 271 through page 285.
Open Questions
- This diagram treats a character as one code point for simplicity. That is often useful but not always true for user-visible text.