Unicode_fonts

Unicode font

Unicode font

Computer font that maps glyphs to code points defined in the Unicode Standard


A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard.[1] The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters (149,813 characters, with Unicode 15.1). This article lists some widely used Unicode fonts (shipped with an operating system or produced by a well-known commercial font company) that support a comparatively large number and broad range of Unicode characters.

Background

The Unicode standard does not specify or create any font (typeface), a collection of graphical shapes called glyphs, itself. Rather, it defines the abstract characters as a specific number (known as a code point) and also defines the required changes of shape depending on the context the glyph is used in (e.g., combining characters, precomposed characters and letter-diacritic combinations). The choice of font, which governs how the abstract characters in the Universal Coded Character Set (UCS) are converted into a bitmap or vector output that can then be viewed on a screen or printed, is left up to the user. If a font is chosen which does not contain a glyph for a code point used in the document, it typically displays a question mark, a box, or some other substitute character.

Computer fonts use various techniques to display characters or glyphs. A bitmap font contains a grid of dots known as pixels forming an image of each glyph in each face and size. Outline fonts (also known as vector fonts) use drawing instructions or mathematical formulæ to describe each glyph. Stroke fonts use a series of specified lines (for the glyph's border) and additional information to define the profile, or size and shape of the line in a specific face and size, which together describe the appearance of the glyph.

Fonts also include embedded special orthographic rules to output certain combinations of letterforms (an alternative symbols for the same letter) be combined into special ligature forms (mixed characters). Operating systems, web browsers (user agent), and other software that extensively use typography, use a font to display text on the screen or print media, and can be programmed to use those embedded rules. Alternatively, they may use external script-shaping technologies (rendering technology or “smart font” engine), and they can also be programmed to use either a large Unicode font, or use multiple different fonts for different characters or languages.

No single "Unicode font" includes all the characters defined in the present revision of ISO 10646 (Unicode) standard, as more and more languages and characters are continually added to it, and common font formats cannot contain more than 65,535 glyphs (about half the number of characters encoded in Unicode). As a result, font developers and foundries incorporate new characters in newer versions or revisions of a font, or in separate auxiliary fonts intended specifically for particular languages.

UCS has over 1.1 million code points, but only the first 65,536 (the Plane 0: Basic Multilingual Plane, or BMP) had entered into common use before 2000.

See the Unicode planes article for more information on other planes, including: Plane 1: Supplementary Multilingual Plane (SMP), Plane 2: Supplementary Ideographic Plane (SIP), Plane 14: Supplementary Special-purpose Plane (SSP), Plane 15 and 16: reserved for Private Use Areas (PUA).

The first Unicode fonts (with very large character sets and supporting many Unicode blocks) were Lucida Sans Unicode (released March 1993), Unihan font (1993), and Everson Mono (1995).

Issues

There are typographical ambiguities in Unicode, so that some of the unified Han characters (seen in Chinese, Japanese, and Korean) will be typographically different in different regions. For example, Unicode point U+9AA8 CJK UNIFIED IDEOGRAPH-9AA8 is typographically different between simplified Chinese and traditional Chinese. This has implications for the idea that a single typeface can satisfy the needs of all locales.[2] The design of Unicode ensures that such differences do not create semantic ambiguity, but the use of incorrect forms is often considered visually awkward or aesthetically inappropriate to native readers of East Asian languages.

Application of Unicode fonts

Unicode is now the standard encoding for many new standards and protocols, and is built into the architecture of operating systems (Microsoft Windows, Apple Mac OS, and many versions of Unix and Linux), programming languages (Ada, Perl, Python, Java, Common LISP, APL), and libraries (IBM International Components for Unicode (ICU), along with the Pango, Graphite, Scribe, Uniscribe, and ATSUI rendering engines), font formats (TrueType and OpenType) and so on. Many other standards are also getting upgraded to be Unicode-compliant.

Utility software

Here is a selection of some of the utility software that can identify the characters present in a font file:

List of Unicode fonts

Of the many Unicode fonts available, those listed below are the most commonly used worldwide on mainstream computing platforms.

More information Font, Chars ...
Note
^‡ OTF+TTO: OpenType font with TrueType outlines.
^‡ OpenType fonts sometimes don't contain a one-by-one kernpair table but a kern-by-classes table where groups of similar characters are seen as one kern group. For instance, V and W have nearly the same left and right geometry. So “0” doesn't mean that no kerning is supported.
^⸶ Register after "reasonable" period (author's words).
^⸷ Includes more than 27,000 Hanzi glyphs from WenQuanYi Bitmap Song font.
^⸸ Han Nom A covers mainly CJK U Ideographs Ext A, and Han Nom B covers mostly Ext B.
Sun-Ext A covers 102 blocks of different languages. Sun-ExtB covers mostly CJK Supplement, CJK U Ideographs Ext B, C, TaiXuan Jing.
^⸹ Zen Hei, Zen Hei Mono and Zen Hei Sharp co-exist in a single TTC file; also with embedded bitmaps. Latin/Hangul derived from UnDotum, Bopomofo derived from cwTeX, mono-spaced Latin from M+ M2 Light. Full CJK coverage. Included with Fedora Linux, Ubuntu Linux.

Comparison of fonts

Number of characters included by the above version of fonts, for different Unicode blocks are listed below. Basic Latin (128: 0000–007F) means that in the range called 'Basic Latin', there are 128 assigned codes, numbered 0 to 7F. The cells then show the number of those codes which are covered by each font. Unicode blocks listed are valid for Unicode version 8.0.

Cells shaded green indicate complete coverage.
Cells shaded blue are not complete, but are the most complete of the fonts listed.
Empty cells indicate that no character exists in that block.

0000–077F

More information Font Range, Arial ...

0780–139F

More information Arial, Arial Unicode MS ...

13A0–1DBF

More information Arial, Arial Unicode MS ...

1DC0–257F

More information Arial, Arial Unicode MS ...

2580–2DFF

More information Arial, Arial Unicode MS ...

2E00–4DBF

More information Arial, Arial Unicode MS ...

4DC0–FAFF

More information Arial, Arial Unicode MS ...

FB00–FFFF

More information Arial, Arial Unicode MS ...

List of SMP Unicode fonts

More information Font, Char(s) ...

10000–1F9FF

Unicode blocks listed are valid for Unicode version 8.0.

More information Font Range, Arial ...

List of SIP Unicode fonts

More information Font, Char(s) ...

20000–2FFFF

Unicode blocks listed are valid for Unicode version 8.0.

More information Font Range, Arial ...

List of SSP Unicode fonts

E0000–EFFFF

Unicode blocks listed are valid for Unicode version 8.0.

More information Font Range, Arial ...

See also


References

  1. "Fonts and keyboards". Unicode Consortium. 28 June 2017. Archived from the original on 18 October 2019. Retrieved 13 October 2019.
  2. Ken Lunde, CJKV Information Processing, O'Reilly Inc, 1999. Page 128, "CJKV character form differences"
  3. "Arial Unicode MS". Microsoft. Archived from the original on 2010-01-08. Retrieved 2010-01-15.
  4. "STI Pub Companies Explained". STIX Fonts. Archived from the original on 2012-04-13. Retrieved 2012-08-21.
  5. "Microsoft's TrueType core fonts for the Web". Archived from the original on 2015-06-01. Retrieved 2010-04-21.
  6. "Wen Quan Yi – Open Source Chinese: BitmapSong en". Wenq.org. 2012-05-14. Retrieved 2012-08-21.
  7. "First STIX and now XITS | خالد حسني". Khaledhosny.org. Archived from the original on 2012-03-25. Retrieved 2012-08-21.

Share this article:

This article uses material from the Wikipedia article Unicode_fonts, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.