Unicode And UTF-8

type
quiz
status
draft
id
quiz.unicode-and-utf-8

Quiz: Unicode And UTF-8

Questions

  1. What is ASCII?
  2. What is Unicode?
  3. What is a Unicode code point?
  4. What is UTF-8?
  5. Is Unicode the same thing as UTF-8?
  6. Is modern Unicode simply a 16-bit encoding?
  7. Why does UTF-8 preserve ASCII?
  8. How many bytes can UTF-8 use for one Unicode code point?
  9. Why can character count and byte count differ?
  10. Why is Unicode not the same thing as a font?
  11. Name two reasons a Unicode character may not display correctly even if the bytes decode.
  12. What does it mean for a byte sequence to be invalid UTF-8?
  13. What is the relationship between this topic and from-bits-to-meaning?
answers

Answers

  1. ASCII is an older coded character set that maps a limited set of characters and control codes to numeric values.
  2. Unicode is a universal character encoding standard for processing, storage, and interchange of text across modern software and protocols.
  3. A code point is a Unicode value assigned in the Unicode code space, written like U+0041.
  4. UTF-8 is the byte-oriented encoding form of Unicode.
  5. No. Unicode is the standard; UTF-8 is one encoding form for representing Unicode text as bytes.
  6. No. Early Unicode was 16-bit, but modern Unicode uses the code space U+0000 through U+10FFFF.
  7. UTF-8 uses the same byte values for ASCII-range characters, which helps it work in ASCII-shaped environments.
  8. One to four bytes.
  9. Because UTF-8 uses one byte for ASCII-range characters and multiple bytes for many non-ASCII characters.
  10. Unicode identifies encoded characters; fonts provide glyphs for displaying them.
  11. Possible reasons include missing font support, old OS support, application limitations, or missing language/script support.
  12. It means the bytes do not follow UTF-8’s well-formed byte-sequence rules and must not be interpreted as valid characters by a conforming process.
  13. It extends the earlier encoding idea from ASCII and bytes into modern text: characters become code points, and code points become bytes through UTF-8.

Source Anchors

Open Questions