Universal Character Set characters

The Unicode Consortium (UC) and the International Organisation for Standardisation (ISO) collaborate on the Universal Character Set (UCS). The UCS is an international standard to map characters used in natural language, mathematics, music, and other domains to machine-readable values. By creating this mapping, the UCS enables computer software vendors to interoperate and transmit UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple meanings and thus be improperly decoded if the wrong one is chosen.

UCS has a potential capacity to encode over 1 million characters. Each UCS character is abstractly represented by a code point, which is an integer between 0 and 1,114,111, used to represent each character within the internal logic of text-processing software (1,114,112 = 220 + 216 or 17 × 216, or hexadecimal 110,000 code points). As of Unicode 14.0, released in September 2021, 288,512 (26%) of these code points are allocated, including 144,762 (13%) assigned characters, 137,468 (12.3%) reserved for private use, 2,048 for surrogates, and 66 designated noncharacters, leaving 825,600 (74%) unallocated. The number of encoded characters is made up as follows:

  • 144,532 graphical characters (some of which do not have a visible glyph, but are still counted as graphical)
  • 230 special purpose characters for control and formatting.

ISO maintains the basic mapping of characters from character name to code point. Often the terms "character" and "code point" will get used interchangeably. However, when a distinction is made, a code point refers to the integer of the character: what one might think of as its address. While a character in UCS 10646 includes the combination of the code point and its name, Unicode adds many other useful properties to the character set, such as block, category, script, and directionality.

In addition to the UCS, Unicode also provides other implementation details such as:

  1. transcending mappings between UCS and other character sets
  2. different collations of characters and character strings for different languages
  3. an algorithm for laying out bidirectional text, where text on the same line may shift between left-to-right and right-to-left
  4. a case-folding algorithm

Computer software end users enter these characters into programs through various input methods. Input methods can be through keyboard or a graphical character palette.

The UCS can be divided in various ways, such as by plane, block, character category, or character property.[1]


Share this article:

This article uses material from the Wikipedia article Universal Character Set characters, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.