Device_Control_2

C0 and C1 control codes

C0 and C1 control codes

Control characters, ranging from U+0000 to U+001F (C0) and U+0080 to U+009F (C1) in Unicode


The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

C0 codes are the range 00HEX–1FHEX and the default C0 set was originally defined in ISO 646 (ASCII). C1 codes are the range 80HEX–9FHEX and the default C1 set was originally defined in ECMA-48 (harmonized later with ISO 6429). The ISO/IEC 2022 system of specifying control and graphic characters allows other C0 and C1 sets to be available for specialized applications, but they are rarely used.

C0 controls

ASCII defined 32 control characters, plus a necessary extra character for the DEL character, 7FHEX or 01111111BIN (needed to punch out all the holes on a paper tape and erase it).

This large number of codes was desirable at the time, as multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electronics and mechanical terminals.

Only a few codes have maintained their use: BEL, ESC, and the "Format Effector" (FEn) characters BS, TAB, LF, VT, FF, and CR. Others are unused or have acquired different meanings such as NUL being the C string terminator. Some data transfer protocols such as ANPA-1312, Kermit, and XMODEM do make extensive use of SOH, STX, ETX, EOT, ACK, NAK and SYN for purposes approximating their original definitions; and some file formats use the "Information Separators" (ISn) such as the Unix info format[1] and Python's splitlines string method.[2]

The names of some codes were changed in ISO 6429:1992 (or ECMA-48:1991) to be neutral with respect to writing direction. The abbreviations used were not changed, as the standard had already specified that those would remain unchanged when the standard is translated to other languages. In this table both new and old names are shown for the renamed controls (the old name is the one matching the abbreviation).

More information Caret notation, Decimal ...
  1. Teletype labelled the key WRU for 'who are you?'[6]
  2. The name BELL is assigned by Unicode to the unrelated emoji character 🔔 (U+1F514). While C0 and C1 control characters were not formally named by the Unicode standard itself at the time, this collided with existing use of BELL as the name of this control character in software following the previous versions of UTS#18 (the Unicode Regular Expressions standard),[7] e.g. in Perl.[8] Unicode now accepts ALERT and BEL (but not BELL) as formal aliases for the control character,[9] although the code chart still lists BELL as the ISO 6429 alias,[10] and the corresponding control picture code point is called SYMBOL FOR BELL. Perl subsequently switched to using BELL for the emoji in version 5.18.[11]
  3. ISO/IEC 2022 (ECMA-35) refers to these as LS0 and LS1 in 8-bit environments, and as SI and SO in 7-bit environments.[12]
  4. The first, 1963 edition of ASCII classified DLE as a device control, rather than a transmission control, and gave it the abbreviation DC0 ("device control reserved for data link escape").[13]
  5. The '\e' escape sequence is not part of ISO C and many other language specifications. However, it is understood by several compilers, including GCC.

C1 controls

In 1973, ECMA-35 and ISO 2022[17] attempted to define a method so an 8-bit "extended ASCII" code could be converted to a corresponding 7-bit code, and vice versa.[18] In a 7-bit environment, the Shift Out (SO) would change the meaning of the 96 bytes 0x20 through 0x7F[lower-alpha 1][20] (i.e. all but the C0 control codes), to be the characters that an 8-bit environment would print if it used the same code with the high bit set. This meant that the range 0x80 through 0x9F could not be printed in a 7-bit environment,[18] thus it was decided that no alternative character set could use them, and that these codes should be additional control codes, which become known as the C1 control codes. To allow a 7-bit environment to use these new controls, the sequences ESC @ through ESC _ were to be considered equivalent.[18] The later ISO 8859 standards abandoned support for 7-bit codes, but preserved this range of control characters.

The first C1 control code set to be registered for use with ISO 2022 was DIN 31626,[21] a specialised set for bibliographic use which was registered in 1979.[22]

The more common general-use ISO/IEC 6429 set was registered in 1983,[23] although the ECMA-48 specification upon which it was based had been first published in 1976[24] and JIS X 0211 (formerly JIS C 6323).[25] Symbolic names defined by RFC 1345 and early drafts of ISO 10646, but not in ISO/IEC 6429 (PAD, HOP and SGC) are also used.[8][26]

Except for SS2 and SS3 in EUC-JP text, and NEL in text transcoded from EBCDIC, the 8-bit forms of these codes were almost never used. CSI, DCS and OSC are used to control text terminals and terminal emulators, but almost always by using their 7-bit escape code representations. Nowadays if these codes are encountered it is far more likely they are intended to be printing characters from that position of Windows-1252 or Mac OS Roman.

More information ESC+, Decimal ...
  1. In early versions the range excluded SP and DEL[19]
  2. Not part of ISO/IEC 6429 (ECMA-48)[8][26]
  3. Not part of the first edition of ISO/IEC 6429.[23]
  4. Deprecated in 1988 and withdrawn in 1992 from ISO/IEC 6429 (1986 and 1991 respectively for ECMA-48).[citation needed]

Other control code sets

The ISO/IEC 2022 (ECMA-35) extension mechanism allowed escape sequences to change the C0 and C1 sets. The standard C0 control character set shown above is chosen with the sequence ESC ! @ and the above C1 set chosen with the sequence ESC " C.[23]

Several official and unofficial alternatives have been defined, but this is pretty much obsolete. Most were forced to retain a good deal of compatibility with the ASCII controls for interoperability. The standard makes ESC,[34][35] SP and DEL[lower-alpha 1] "fixed" coded characters, which are available in their ASCII locations in all encodings that conform to the standard.[37] It also specifies that if a C0 set included transmission control (TCn) codes, they must be encoded at their ASCII locations[34] and could not be put in a C1 set,[38] and any new transmission controls must be in a C1 set.[34]

Other C0 control code sets

  • ANPA-1312, a text markup language used for news transmission, replaces several C0 control characters.
  • IPTC 7901, the newer international version of the above, has its own variations.
  • Videotex has a completely different set.
  • Teletext also defines a set similar to Videotex.
  • T.61/T.51,[39] and others[40] replaced EM and GS with SS2 and SS3 so these functions could be used in a 7-bit environment.
  • Some sets replaced FS with SS2,[41] (same as ANPA-1312).
  • The now-withdrawn JIS C 6225, designated JIS X 0207 in later sources.[42] replaced FS with CEX or "Control Extension"[43] which introduces control sequences for vertical text behaviour, superscripts and subscripts[44] and for transmitting custom character graphics.[42]

Replacement C1 character sets

  • A specialized C1 control code set is registered for bibliographic use (including string collation), such as by MARC-8.[22][45][46]
  • Various specialised C1 control code sets are registered for use by Videotex formats.[21]
  • EBCDIC defines up to 29 additional control codes besides those present in ASCII. When translating EBCDIC to Unicode (or to ISO 8859), these codes are mapped to C1 control characters in a manner specified by IBM's Character Data Representation Architecture (CDRA).[47][48] Although the New Line (NL) does translate to the ISO/IEC 6429 NEL (although it is often swapped with LF, following UNIX line ending convention),[47] the remainder of the control codes do not correspond. For example, the EBCDIC control SPS and the ECMA-48 control PLU are both used to begin a superscript or end a subscript, but are not mapped to one another. Extended-ASCII-mapped EBCDIC can therefore be regarded as having its own C1 set, although it is not registered with the ISO-IR registry for ISO/IEC 2022.[21]

Unicode

Unicode inherits its first 256 code points from ISO 8859-1, hence also the 65 code points described above, giving them the general category Cc (control). These are:

Unicode only specifies semantics for the C0 format controls HT, LF, VT, FF, and CR, (note BS is missing); the C0 information separators FS, GS, RS, US (and SP); and the C1 control NEL.[49] The rest of the codes are transparent to Unicode and their meanings are left to higher-level protocols, with ISO/IEC 6429 suggested as a default.[49]

Unicode includes many additional format effector characters besides these, such as marks, embeds, isolates and pops for explicit bidirectional formatting, and the zero-width joiner and non-joiner for controlling ligature use. However these are given the general category Cf (format) rather than Cc.

See also

Footnotes

  1. ISO/IEC 4873 extends this requirement to the C1 SS2 and SS3,[36] although ISO/IEC 2022 itself does not.

References

  1. Fox, Brian. "Adding a new node to Info". Info: The online, menu-driven GNU documentation system. GNU Project.
  2. ISO/TC 97/SC 2 (1975). The set of control characters of the ISO 646 (PDF). ITSCJ/IPSJ. ISO-IR-1.{{citation}}: CS1 maint: numeric names: authors list (link)
  3. IPTC (1995). The IPTC Recommended Message Format (PDF) (5th ed.). IPTC TEC 7901.
  4. Robert McConnell; James Haynes; Richard Warren (December 2002). "Understanding ASCII Codes". NADCOMM.
  5. "Name Aliases". Unicode Character Database. Unicode Consortium.
  6. "C0 Controls and Basic Latin" (PDF). Unicode Consortium.
  7. "charnames". Perl Programming Documentation.
  8. ECMA (1994). "7.3: Invocation of character-set code elements". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 14. ECMA-35.
  9. "What is the point of Ctrl-S?". Unix and Linux Stack exchange. Retrieved 14 February 2019.
  10. ECMA/TC 1 (1973). "Brief History". 7-bit Input/Output Coded Character Set (PDF) (4th ed.). ECMA. ECMA-6:1973.{{citation}}: CS1 maint: numeric names: authors list (link)
  11. ECMA/TC 1 (1971). "8.2: Correspondence between the 7-bit Code and an 8-bit Code". Extension of the 7-bit Coded Character Set (PDF) (1st ed.). ECMA. pp. 21–24. ECMA-35:1971.{{citation}}: CS1 maint: numeric names: authors list (link)
  12. ECMA/TC 1 (1973). "4.2: Specific Control Characters". 7-bit Input/Output Coded Character Set (PDF) (4th ed.). ECMA. p. 16. ECMA-6:1973.{{citation}}: CS1 maint: numeric names: authors list (link)
  13. ECMA/TC 1 (1985). "5.3.8: Sets of 96 graphic characters". Code Extension Techniques (PDF) (4th ed.). ECMA. pp. 17–18. ECMA-35:1985.{{citation}}: CS1 maint: numeric names: authors list (link)
  14. ISO/TC97/SC2 (1983-10-01). C1 Control Set of ISO 6429:1983 (PDF). ITSCJ/IPSJ. ISO-IR-77.{{citation}}: CS1 maint: numeric names: authors list (link)
  15. ECMA/TC 1 (1979). "Brief History". Additional Control Functions for Character-Imaging I/O Devices (PDF) (2nd ed.). ECMA. ECMA-48:1979.{{citation}}: CS1 maint: numeric names: authors list (link)
  16. Ken Whistler (2015-10-05). "Why Nothing Ever Goes Away". Unicode Mailing List.
  17. ECMA (1991). Control Functions for Coded Character Sets. Standard ECMA-48.
  18. Moy, Edward; Gildea, Stephen; Dickey, Thomas. "Device-Control functions". XTerm Control Sequences.
  19. Moy, Edward; Gildea, Stephen; Dickey, Thomas. "Operating System Commands". XTerm Control Sequences.
  20. Frank da Cruz; Christine Gianone (1997). Using C-Kermit. Digital Press. p. 278. ISBN 978-1-55558-164-0.
  21. ECMA (1994). "6.4.2: Primary sets of coded control functions". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 11. ECMA-35.
  22. ISO/TC97/SC2/WG-7; ECMA (1985-08-01). Minimum C0 set for ISO 4873 (PDF). ITSCJ/IPSJ. ISO-IR-104.{{citation}}: CS1 maint: numeric names: authors list (link)
  23. ISO/TC97/SC2/WG-7; ECMA (1985-08-01). Minimum C1 Set for ISO 4873 (PDF). ITSCJ/IPSJ. ISO-IR-105.{{citation}}: CS1 maint: numeric names: authors list (link)
  24. ECMA (1994). "6.2: Fixed coded characters". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 7. ECMA-35.
  25. ECMA (1994). "6.4.3: Supplementary sets of coded control functions". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 11. ECMA-35.
  26. Úřad pro normalizaci a měřeni (1987). The set of control characters of ISO 646, with EM replaced by SS2 (PDF). ITSCJ/IPSJ. ISO-IR-140.
  27. ISO/TC97/SC2/WG6. "Liaison statement to ISO/TC97/SC2/WG8 and ISO/TC97/SC18/WG8" (PDF). ISO/TC97/SC2/WG6 N317.rev. Archived from the original (PDF) on 2020-10-26.{{cite web}}: CS1 maint: numeric names: authors list (link)
  28. ISO/TC 97/SC 2 (1982). The C0 set of Control Characters of Japanese Standard JIS C 6225-1979 (PDF). ITSCJ/IPSJ. ISO-IR-74.{{citation}}: CS1 maint: numeric names: authors list (link)
  29. Printronix (2012). OKI® Programmer's Reference Manual (PDF). p. 26.
  30. ISO/TC 46 (1983-06-01). Additional Control Codes for Bibliographic Use according to International Standard ISO 6630 (PDF). ITSCJ/IPSJ. ISO-IR-67.{{citation}}: CS1 maint: numeric names: authors list (link)
  31. ISO/TC 46 (1986-02-01). Additional Control Codes for Bibliographic Use according to International Standard ISO 6630 (PDF). ITSCJ/IPSJ. ISO-IR-124.{{citation}}: CS1 maint: numeric names: authors list (link)
  32. Umamaheswaran, V.S. (1999-11-08). "3.3 Step 2: Byte Conversion". UTF-EBCDIC. Unicode Consortium. Unicode Technical Report #16. The 64 control characters […], the ASCII DELETE character (U+007F)[…] are mapped respecting EBCDIC conventions, as defined in IBM Character Data Representation Architecture, CDRA, with one exception -- the pairing of EBCDIC Line Feed and New Line control characters are swapped from their CDRA default pairings to ISO/IEC 6429 Line Feed (U+000A) and Next Line (U+0085) control characters
  33. "23.1: Control Codes" (PDF). The Unicode Standard (12.0.0 ed.). Unicode Consortium. 2019. pp. 868–870. ISBN 978-1-936213-22-1.

Share this article:

This article uses material from the Wikipedia article Device_Control_2, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.