Unicode And UTF-8
- type
- quiz
- status
- draft
- id
- quiz.unicode-and-utf-8
Quiz: Unicode And UTF-8
Questions
- What is ASCII?
- What is Unicode?
- What is a Unicode code point?
- What is UTF-8?
- Is Unicode the same thing as UTF-8?
- Is modern Unicode simply a 16-bit encoding?
- Why does UTF-8 preserve ASCII?
- How many bytes can UTF-8 use for one Unicode code point?
- Why can character count and byte count differ?
- Why is Unicode not the same thing as a font?
- Name two reasons a Unicode character may not display correctly even if the bytes decode.
- What does it mean for a byte sequence to be invalid UTF-8?
- What is the relationship between this topic and
from-bits-to-meaning?
answers
Answers
- ASCII is an older coded character set that maps a limited set of characters and control codes to numeric values.
- Unicode is a universal character encoding standard for processing, storage, and interchange of text across modern software and protocols.
- A code point is a Unicode value assigned in the Unicode code space, written like
U+0041.
- UTF-8 is the byte-oriented encoding form of Unicode.
- No. Unicode is the standard; UTF-8 is one encoding form for representing Unicode text as bytes.
- No. Early Unicode was 16-bit, but modern Unicode uses the code space
U+0000 through U+10FFFF.
- UTF-8 uses the same byte values for ASCII-range characters, which helps it work in ASCII-shaped environments.
- One to four bytes.
- Because UTF-8 uses one byte for ASCII-range characters and multiple bytes for many non-ASCII characters.
- Unicode identifies encoded characters; fonts provide glyphs for displaying them.
- Possible reasons include missing font support, old OS support, application limitations, or missing language/script support.
- It means the bytes do not follow UTF-8’s well-formed byte-sequence rules and must not be interpreted as valid characters by a conforming process.
- It extends the earlier encoding idea from ASCII and bytes into modern text: characters become code points, and code points become bytes through UTF-8.
Source Anchors
unicode-standard-about
- “About the Unicode Standard”,
Characters for the World.
unicode-faq-basic-questions
Q: What is Unicode?
Q: What is the scope of Unicode?
Q: Where can I purchase the Unicode software or the Unicode font?
Q: My computer cannot display some of the latest Unicode symbols...
unicode-faq-utf-bom
Q: Is Unicode a 16-bit encoding?
Q: What is a UTF?
Q: What are some of the differences between the UTFs?
Q: Is there a standard method to package a Unicode character so it fits an 8-Bit ASCII stream?
Q: What is the definition of UTF-8?
petzold-code-hidden-language-computer-hardware-software-2e
Chapter 20. ASCII and a Cast of Characters, near page 271 through page 285.
Open Questions
- This quiz avoids grapheme clusters, normalization, bidirectional text, and script shaping.