Computing Foundations

Unicode And UTF-8

status
draft
id
lesson.unicode-and-utf-8

The Short Version

ASCII, Unicode, and UTF-8 are related, but they are not the same thing.

The key practical idea:

character -> Unicode code point -> UTF-8 bytes

For ASCII characters, UTF-8 preserves the same byte values. For many other characters, UTF-8 uses more than one byte.

Why This Matters

In from-bits-to-meaning, ASCII was enough to show how text can become bytes. But real software handles names, accents, symbols, non-Latin scripts, and emoji. That requires Unicode.

The most important beginner correction is:

one character is not always one byte

And soon after:

one visible character is not always one Unicode code point

That second sentence is an open door for later, not the main burden of this topic.

Source-Grounded Claims

Source Anchors

Open Questions