Basic_Latin_(Unicode_block)

Basic Latin (Unicode block)

Basic Latin (Unicode block)

Unicode character block


The Basic Latin Unicode block,[3] sometimes informally called C0 Controls and Basic Latin,[4] is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

Quick Facts Basic Latinor C0 Controls and Basic Latin, Range ...

The Basic Latin block was included in its present form from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.[5] Its block name in Unicode 1.0 was ASCII.[6]

Table of characters

More information Code, Result ...
A The letter U+005C (\) may show up as a Yen(¥) or Won(₩) sign in Japanese/Korean fonts mistaking Unicode (especially UTF-8) as a legacy character set which replaced the backslash with these signs.[7]

Subheadings

The C0 Controls and Basic Latin block contains six subheadings.[8]

C0 controls

The C0 Controls, referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard.[8]

ASCII punctuation and symbols

This subheading refers to standard punctuation characters, simple mathematical operators, and symbols like the dollar sign, percent, ampersand, underscore, and pipe.[8]

ASCII digits

The ASCII Digits subheading contains the standard European number characters 1–9 and 0.[8]

Uppercase Latin alphabet

The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the majuscule.[8]

Lowercase Latin alphabet

The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the minuscule.[8]

Control character

The Control Character subheading contains the "Delete" character.[8]

Number of symbols, letters and control codes

The table below shows the number of letters, symbols and control codes in each of the subheadings in the C0 Controls and Basic Latin block.

More information Subheading, Number of symbols ...

Chart

C0 Controls and Basic Latin[lower-alpha 1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+000x NUL SOH STX ETX EOT ENQ ACK BEL  BS   HT   LF   VT   FF   CR   SO   SI 
U+001x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN  EM  SUB ESC  FS   GS   RS   US 
U+002x  SP   ! " # $  % & ' ( ) * + , - . /
U+003x 0 1 2 3 4 5 6 7 8 9  :  ; < = >  ?
U+004x @ A B C D E F G H I J K L M N O
U+005x P Q R S T U V W X Y Z [ \ ] ^ _
U+006x ` a b c d e f g h i j k l m n o
U+007x p q r s t u v w x y z { | } ~ DEL
  1. As of Unicode version 15.1

Variants

Several of the characters are defined to render as a standardized variant if followed by variant indicators.

A variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0).[9][10]

Twelve characters (#, *, and the digits) can be followed by U+FE0E VS15 or U+FE0F VS16 to create emoji variants.[11][12][13][14] They are keycap base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP). The VS15 version is "text presentation" while the VS16 version is "emoji-style".[10]

Emoji variation sequences
U+0023002A0030003100320033003400350036003700380039
base#*0123456789
base+VS15+keycap#*0123456789
base+VS16+keycap#*0123456789

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Basic Latin block:

More information Version, Final code points ...

See also


References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. "block.txt". The Unicode Consortium. Retrieved 2023-03-23.
  4. "C0 Controls and Basic Latin" (PDF). The Unicode Standard, Version 15.0. Unicode, Inc. 2022. Retrieved March 22, 2023.
  5. The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc. 1990. ISBN 0-201-56788-1.
  6. "3.8: Block-by-Block Charts" (PDF). The Unicode Standard. version 1.0. Unicode Consortium.
  7. Michael S. Kaplan (2005-09-17). "When is a backslash not a backslash?". Sorting it all Out. Microsoft. Archived from the original on 2010-06-12. Also available at: http://archives.miloush.net/michkap/archive/2005/09/17/469941.doc
  8. "Unicode 6.2 code charts" (PDF). The Unicode Standard. Retrieved 1 April 2013.
  9. Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015-10-30). "L2/15-268: Proposal to Represent the Slashed Zero Variant of Empty Set" (PDF).
  10. "UTR #51: Unicode Emoji". Unicode Consortium. 2023-09-05.
  11. "UCD: Emoji Data for UTR #51". Unicode Consortium. 2023-02-01.
Listen to this article (5 minutes)
Spoken Wikipedia icon
This audio file was created from a revision of this article dated 8 November 2023 (2023-11-08), and does not reflect subsequent edits.

Share this article:

This article uses material from the Wikipedia article Basic_Latin_(Unicode_block), and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.