Unicode And UTF-8

type
diagram
status
draft
id
diagram.unicode-and-utf-8

Diagram: Unicode And UTF-8

Conceptual Path

flowchart TD A["Human text idea<br/>letter, mark, symbol, punctuation"] --> B["Character identity<br/>what character is this?"] B --> C["Unicode code point<br/>example: U+0041"] C --> D["Encoding form<br/>UTF-8, UTF-16, UTF-32"] D --> E["Byte sequence<br/>example in UTF-8: 41"] E --> F["Storage or transmission<br/>file, network, database, memory"] F --> G["Decoder reads bytes<br/>using the expected encoding"] G --> H["Code points again"] H --> I["Rendering system<br/>font + layout + display"] I --> J["Visible text"]
source
flowchart TD
    A["Human text idea<br/>letter, mark, symbol, punctuation"] --> B["Character identity<br/>what character is this?"]
    B --> C["Unicode code point<br/>example: U+0041"]
    C --> D["Encoding form<br/>UTF-8, UTF-16, UTF-32"]
    D --> E["Byte sequence<br/>example in UTF-8: 41"]
    E --> F["Storage or transmission<br/>file, network, database, memory"]
    F --> G["Decoder reads bytes<br/>using the expected encoding"]
    G --> H["Code points again"]
    H --> I["Rendering system<br/>font + layout + display"]
    I --> J["Visible text"]

ASCII And UTF-8

flowchart LR A["ASCII character A"] --> B["ASCII byte 41h"] A --> C["Unicode code point U+0041"] C --> D["UTF-8 byte 41h"] B -. "same byte for ASCII range" .- D
source
flowchart LR
    A["ASCII character A"] --> B["ASCII byte 41h"]
    A --> C["Unicode code point U+0041"]
    C --> D["UTF-8 byte 41h"]
    B -. "same byte for ASCII range" .- D

Non-ASCII Example

flowchart LR A["Character: e with acute"] --> B["Unicode code point U+00E9"] B --> C["UTF-8 bytes C3 A9"] C --> D["Stored as two bytes"]
source
flowchart LR
    A["Character: e with acute"] --> B["Unicode code point U+00E9"]
    B --> C["UTF-8 bytes C3 A9"]
    C --> D["Stored as two bytes"]

Source Anchors

Open Questions