UTF-1

UTF-1

UTF-1

Obsolete multibyte encoding for Unicode


UTF-1 is a method of transforming ISO/IEC 10646/Unicode into a stream of bytes. Its design does not provide self-synchronization, which makes searching for substrings and error recovery difficult. It reuses the ASCII printing characters for multi-byte encodings, making it unsuited for some uses (for instance Unix filenames cannot contain the byte value used for forward slash). UTF-1 is also slow to encode or decode due to its use of division and multiplication by a number which is not a power of 2. Due to these issues, it did not gain acceptance and was quickly replaced by UTF-8.

Quick Facts MIME / IANA, Language(s) ...

Design

Similar to UTF-8, UTF-1 is a variable-width encoding that is backwards-compatible with ASCII. Every Unicode code point is represented by either a single byte, or a sequence of two, three, or five bytes. All ASCII code points are a single byte (the code points U+0080 through U+009F are also single bytes).

UTF-1 does not use the C0 and C1 control codes or the space character in multi-byte encodings: a byte in the range 0–0x20 or 0x7F–0x9F always stands for the corresponding code point. This design with 66 protected characters tried to be ISO/IEC 2022 compatible.

UTF-1 uses "modulo 190" arithmetic (256 − 66 = 190). For comparison, UTF-8 protects all 128 ASCII characters and needs one bit for this, and a second bit to make it self-synchronizing, resulting in "modulo 64" arithmetic (8 − 2 = 6; 26 = 64). BOCU-1 protects only the minimal set required for MIME-compatibility (0x00, 0x07–0x0F, 0x1A–0x1B, and 0x20), resulting in "modulo 243" arithmetic (256 − 13 = 243).

More information code point ...

Although modern Unicode ends at U+10FFFF, both UTF-1 and UTF-8 were designed to encode the complete 31 bits of the original Universal Character Set (UCS-4), and the last entry in this table shows this original final code point.

See also

References

  • "The Unicode Standard: Appendix F FSS-UTF" (PDF) (PDF, 768 KiB). Version 1.1. Unicode, Inc.
  • ISO/IEC JTC 1/SC2/WG2 (1993-01-21). "ISO IR 178: UCS Transformation Format One (UTF-1)" (PDF) (PDF, 256 KiB) (1 ed.). Registration number 178. Archived from the original (PDF) on 2015-03-18.{{cite web}}: CS1 maint: numeric names: authors list (link)
  • Czyborra, Roman (1998-11-30). "Unicode Transformation Formats: UTF-8 & Co". Archived from the original on 2016-06-07. Retrieved 2016-06-07.
  • F. Yergeau, F. (November 2003). "UTF-8, a transformation format of ISO 10646". doi:10.17487/RFC3629. {{cite journal}}: Cite journal requires |journal= (help)

Share this article:

This article uses material from the Wikipedia article UTF-1, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.