Arabic_script_in_Unicode

Arabic script in Unicode

Arabic script in Unicode

Add article description


Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t (spelling et, Latin for and) were combined.[1] The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.[2]

As of Unicode 15.1, the Arabic script is contained in the following blocks:[3]

The basic Arabic range encodes the standard letters and diacritics, but does not encode contextual forms (U+0621–U+0652 being directly based on ISO 8859-6); and also includes the most common diacritics and Arabic-Indic digits. The Arabic Supplement range encodes letter variants mostly used for writing African (non-Arabic) languages. The Arabic Extended-B and Arabic Extended-A ranges encode additional Qur'anic annotations and letter variants used for various non-Arabic languages. The Arabic Presentation Forms-A range encodes contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. The Arabic Presentation Forms-B range encodes spacing forms of Arabic diacritics, and more contextual letter forms. The presentation forms are present only for compatibility with older standards, and are not currently needed for coding text.[4] The Arabic Mathematical Alphabetical Symbols block encodes characters used in Arabic mathematical expressions. The Indic Siyaq Numbers block contains a specialized subset of Arabic script that was used for accounting in India under the Mughal Empire by the 17th century through the middle of the 20th century.[5][6] The Ottoman Siyaq Numbers block contains a specialized subset of Arabic script, also known as Siyakat numbers, used for accounting in Ottoman Turkish documents.[6]

Contextual forms

A demonstration for the basic alphabet used in Modern Standard Arabic:

More information General Unicode, Contextual forms ...

Punctuation and ornaments

Only the Arabic question mark ⟨؟⟩ and the Arabic comma ⟨،⟩ are used in regular Arabic script typing and the comma is often substituted for the Latin script comma ⟨,⟩ which is also used as the decimal separator when the Eastern Arabic numerals are used (e.g. ⟨100.6⟩ compared to ⟨١٠٠,٦⟩).

  • U+060C ، ARABIC COMMA
  • U+060D ؍ ARABIC DATE SEPARATOR
  • U+060E ؎ ARABIC POETIC VERSE SIGN
  • U+060F ؏ ARABIC SIGN MISRA
  • U+061B ؛ ARABIC SEMICOLON
  • U+061E ؞ ARABIC TRIPLE DOT PUNCTUATION MARK
  • U+061F ؟ ARABIC QUESTION MARK
  • U+066D ٭ ARABIC FIVE POINTED STAR
  • U+06D4 ۔ ARABIC FULL STOP
  • U+06DD ۝ ARABIC END OF AYAH
  • U+06DE ۞ ARABIC START OF RUB EL HIZB
  • U+06E9 ۩ ARABIC PLACE OF SAJDAH
  • U+06FD ۽ ARABIC SIGN SINDHI AMPERSAND
  • U+FD3E Arabic ornate left parenthesis
  • U+FD3F ﴿ Arabic ornate right parenthesis

Word ligatures

Arabic Presentation Forms-A has a few characters defined as "word ligatures" for terms frequently used in formulaic expressions in Arabic. They are rarely used out of professional liturgical typing, also the Rial grapheme is normally written fully, not by the ligature.

  • U+FDF0 ARABIC LIGATURE SALLA USED AS KORANIC STOP SIGN ISOLATED FORM (صلى, stylized as صلے)
  • U+FDF1 ARABIC LIGATURE QALA USED AS KORANIC STOP SIGN ISOLATED FORM (قلى, stylized as قلے)
  • U+FDF2 ARABIC LIGATURE ALLAH ISOLATED FORM (اللّٰه)
  • U+FDF3 ARABIC LIGATURE AKBAR ISOLATED FORM (اكبر), as in the phrase الله اكبر Allāhu akbar
  • U+FDF4 ARABIC LIGATURE MOHAMMAD ISOLATED FORM (محمد)
  • U+FDF5 ARABIC LIGATURE SALAM ISOLATED FORM (صلعم, the abbreviation for صلى الله عليه وسلم "peace be upon him")
  • U+FDF6 ARABIC LIGATURE RASOUL ISOLATED FORM (رسول)
  • U+FDF7 ARABIC LIGATURE ALAYHE ISOLATED FORM (عليه)
  • U+FDF8 ARABIC LIGATURE WASALLAM ISOLATED FORM (وسلم)
  • U+FDF9 ARABIC LIGATURE SALLA ISOLATED FORM (صلى)
  • U+FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM (صلى الله عليه وسلم "peace be upon him")
  • U+FDFB ARABIC LIGATURE JALLAJALALOUHOU (جل جلاله)
  • U+FDFC RIAL SIGN (ريال)
  • U+FDFD ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM (بسم الله الرحمن الرحيم bism-i llāh-i r-raḥmān-i r-raḥīm)

Code blocks

Arabic

Character table

More information Code, Result ...

Compact table

Arabic[1][2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+060x  ؀   ؁   ؂   ؃   ؄   ؅  ؆ ؇ ؈ ؉ ؊ ؋ ، ؍ ؎ ؏
U+061x ؐ ؑ ؒ ؓ ؔ ؕ ؖ ؗ ؘ ؙ ؚ ؛ ALM ؝ ؞ ؟
U+062x ؠ ء آ أ ؤ إ ئ ا ب ة ت ث ج ح خ د
U+063x ذ ر ز س ش ص ض ط ظ ع غ ػ ؼ ؽ ؾ ؿ
U+064x ـ ف ق ك ل م ن ه و ى ي ً ٌ ٍ َ ُ
U+065x ِ ّ ْ ٓ ٔ ٕ ٖ ٗ ٘ ٙ ٚ ٛ ٜ ٝ ٞ ٟ
U+066x ٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩ ٪ ٫ ٬ ٭ ٮ ٯ
U+067x ٰ ٱ ٲ ٳ ٴ ٵ ٶ ٷ ٸ ٹ ٺ ٻ ټ ٽ پ ٿ
U+068x ڀ ځ ڂ ڃ ڄ څ چ ڇ ڈ ډ ڊ ڋ ڌ ڍ ڎ ڏ
U+069x ڐ ڑ ڒ ړ ڔ ڕ ږ ڗ ژ ڙ ښ ڛ ڜ ڝ ڞ ڟ
U+06Ax ڠ ڡ ڢ ڣ ڤ ڥ ڦ ڧ ڨ ک ڪ ګ ڬ ڭ ڮ گ
U+06Bx ڰ ڱ ڲ ڳ ڴ ڵ ڶ ڷ ڸ ڹ ں ڻ ڼ ڽ ھ ڿ
U+06Cx ۀ ہ ۂ ۃ ۄ ۅ ۆ ۇ ۈ ۉ ۊ ۋ ی ۍ ێ ۏ
U+06Dx ې ۑ ے ۓ ۔ ە ۖ ۗ ۘ ۙ ۚ ۛ ۜ  ۝  ۞ ۟
U+06Ex ۠ ۡ ۢ ۣ ۤ ۥ ۦ ۧ ۨ ۩ ۪ ۫ ۬ ۭ ۮ ۯ
U+06Fx ۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ ۺ ۻ ۼ ۽ ۾ ۿ
Notes
1.^ As of Unicode version 15.1
2.^ Unicode code point U+0673 is deprecated as of Unicode version 6.0

Arabic Supplement

Arabic Supplement[1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+075x ݐ ݑ ݒ ݓ ݔ ݕ ݖ ݗ ݘ ݙ ݚ ݛ ݜ ݝ ݞ ݟ
U+076x ݠ ݡ ݢ ݣ ݤ ݥ ݦ ݧ ݨ ݩ ݪ ݫ ݬ ݭ ݮ ݯ
U+077x ݰ ݱ ݲ ݳ ݴ ݵ ݶ ݷ ݸ ݹ ݺ ݻ ݼ ݽ ݾ ݿ
Notes
1.^ As of Unicode version 15.1

Arabic Extended-B

Arabic Extended-B[1][2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+087x
U+088x
U+089x      
Notes
1.^ As of Unicode version 15.1
2.^ Grey areas indicate non-assigned code points

Arabic Extended-A

Arabic Extended-A[1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+08Ax
U+08Bx
U+08Cx
U+08Dx
U+08Ex   
U+08Fx
Notes
1.^ As of Unicode version 15.1

Arabic Presentation Forms A

They are mostly ligatures which can be created from the previous charts' characters, with the exception of the bracket-like graphemes  ﴿ and some of them are ligatures of common liturgical phrases.

Arabic Presentation Forms-A[1][2][3]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+FB5x
U+FB6x
U+FB7x ﭿ
U+FB8x
U+FB9x
U+FBAx
U+FBBx ﮿
U+FBCx
U+FBDx
U+FBEx
U+FBFx ﯿ
U+FC0x
U+FC1x
U+FC2x
U+FC3x ﰿ
U+FC4x
U+FC5x
U+FC6x
U+FC7x ﱿ
U+FC8x
U+FC9x
U+FCAx
U+FCBx ﲿ
U+FCCx
U+FCDx
U+FCEx
U+FCFx ﳿ
U+FD0x
U+FD1x
U+FD2x
U+FD3x ﴿
U+FD4x
U+FD5x
U+FD6x
U+FD7x ﵿ
U+FD8x
U+FD9x
U+FDAx
U+FDBx ﶿ
U+FDCx
U+FDDx
U+FDEx
U+FDFx ﷿
Notes
1.^ As of Unicode version 15.1
2.^ Grey areas indicate non-assigned code points
3.^ Black areas indicate noncharacters (code points that are guaranteed never to be assigned as encoded characters in the Unicode Standard)

Arabic Presentation Forms B

These can all be created from the basic chart's characters.

Arabic Presentation Forms-B[1][2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+FE7x ﹿ
U+FE8x
U+FE9x
U+FEAx
U+FEBx ﺿ
U+FECx
U+FEDx
U+FEEx
U+FEFx ZW
NBSP
Notes
1.^ As of Unicode version 15.1
2.^ Grey areas indicate non-assigned code points

Rumi Numeral Symbols

Rumi Numeral Symbols[1][2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+10E6x 𐹠 𐹡 𐹢 𐹣 𐹤 𐹥 𐹦 𐹧 𐹨 𐹩 𐹪 𐹫 𐹬 𐹭 𐹮 𐹯
U+10E7x 𐹰 𐹱 𐹲 𐹳 𐹴 𐹵 𐹶 𐹷 𐹸 𐹹 𐹺 𐹻 𐹼 𐹽 𐹾
Notes
1.^ As of Unicode version 15.1
2.^ Grey area indicates non-assigned code point

Arabic Extended-C

Arabic Extended-C[1][2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+10ECx
U+10EDx
U+10EEx
U+10EFx 𐻽 𐻾 𐻿
Notes
1.^ As of Unicode version 15.1
2.^ Grey areas indicate non-assigned code points

Indic Siyaq Numbers

Indic Siyaq Numbers[1][2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+1EC7x 𞱱 𞱲 𞱳 𞱴 𞱵 𞱶 𞱷 𞱸 𞱹 𞱺 𞱻 𞱼 𞱽 𞱾 𞱿
U+1EC8x 𞲀 𞲁 𞲂 𞲃 𞲄 𞲅 𞲆 𞲇 𞲈 𞲉 𞲊 𞲋 𞲌 𞲍 𞲎 𞲏
U+1EC9x 𞲐 𞲑 𞲒 𞲓 𞲔 𞲕 𞲖 𞲗 𞲘 𞲙 𞲚 𞲛 𞲜 𞲝 𞲞 𞲟
U+1ECAx 𞲠 𞲡 𞲢 𞲣 𞲤 𞲥 𞲦 𞲧 𞲨 𞲩 𞲪 𞲫 𞲬 𞲭 𞲮 𞲯
U+1ECBx 𞲰 𞲱 𞲲 𞲳 𞲴
Notes
1.^ As of Unicode version 15.1
2.^ Grey areas indicate non-assigned code points

Ottoman Siyaq Numbers

Ottoman Siyaq Numbers[1][2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+1ED0x 𞴁 𞴂 𞴃 𞴄 𞴅 𞴆 𞴇 𞴈 𞴉 𞴊 𞴋 𞴌 𞴍 𞴎 𞴏
U+1ED1x 𞴐 𞴑 𞴒 𞴓 𞴔 𞴕 𞴖 𞴗 𞴘 𞴙 𞴚 𞴛 𞴜 𞴝 𞴞 𞴟
U+1ED2x 𞴠 𞴡 𞴢 𞴣 𞴤 𞴥 𞴦 𞴧 𞴨 𞴩 𞴪 𞴫 𞴬 𞴭 𞴮 𞴯
U+1ED3x 𞴰 𞴱 𞴲 𞴳 𞴴 𞴵 𞴶 𞴷 𞴸 𞴹 𞴺 𞴻 𞴼 𞴽
U+1ED4x
Notes
1.^ As of Unicode version 15.1
2.^ Grey areas indicate non-assigned code points

Arabic Mathematical Alphabetic Symbols

Arabic Mathematical Alphabetic Symbols[1][2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+1EE0x 𞸀 𞸁 𞸂 𞸃 𞸅 𞸆 𞸇 𞸈 𞸉 𞸊 𞸋 𞸌 𞸍 𞸎 𞸏
U+1EE1x 𞸐 𞸑 𞸒 𞸓 𞸔 𞸕 𞸖 𞸗 𞸘 𞸙 𞸚 𞸛 𞸜 𞸝 𞸞 𞸟
U+1EE2x 𞸡 𞸢 𞸤 𞸧 𞸩 𞸪 𞸫 𞸬 𞸭 𞸮 𞸯
U+1EE3x 𞸰 𞸱 𞸲 𞸴 𞸵 𞸶 𞸷 𞸹 𞸻
U+1EE4x 𞹂 𞹇 𞹉 𞹋 𞹍 𞹎 𞹏
U+1EE5x 𞹑 𞹒 𞹔 𞹗 𞹙 𞹛 𞹝 𞹟
U+1EE6x 𞹡 𞹢 𞹤 𞹧 𞹨 𞹩 𞹪 𞹬 𞹭 𞹮 𞹯
U+1EE7x 𞹰 𞹱 𞹲 𞹴 𞹵 𞹶 𞹷 𞹹 𞹺 𞹻 𞹼 𞹾
U+1EE8x 𞺀 𞺁 𞺂 𞺃 𞺄 𞺅 𞺆 𞺇 𞺈 𞺉 𞺋 𞺌 𞺍 𞺎 𞺏
U+1EE9x 𞺐 𞺑 𞺒 𞺓 𞺔 𞺕 𞺖 𞺗 𞺘 𞺙 𞺚 𞺛
U+1EEAx 𞺡 𞺢 𞺣 𞺥 𞺦 𞺧 𞺨 𞺩 𞺫 𞺬 𞺭 𞺮 𞺯
U+1EEBx 𞺰 𞺱 𞺲 𞺳 𞺴 𞺵 𞺶 𞺷 𞺸 𞺹 𞺺 𞺻
U+1EECx
U+1EEDx
U+1EEEx
U+1EEFx 𞻰 𞻱
Notes
1.^ As of Unicode version 15.1
2.^ Grey areas indicate non-assigned code points

References

  1. "UAX #24: Script data file". Unicode Character Database. The Unicode Consortium.
  2. "Section 9.2: Arabic, Arabic Presentation Forms-B" (PDF). The Unicode Standard. The Unicode Consortium. September 2022.
  3. Pandey, Anshuman (2015-11-05). "L2/15-121R2: Proposal to Encode Indic Siyaq Numbers" (PDF).
  4. "Chapter 22: Symbols". The Unicode Standard, Version 15.0 (PDF). Mountain View, CA: Unicode, Inc. September 2022. ISBN 978-1-936213-32-0.
  5. Deprecated as of Unicode version 6.0 UCD Change History "The particular combination of an alef with this vowel mark should be written with the sequence <U+0627 ARABIC LETTER ALEF, U+065F ARABIC WAVY HAMZA BELOW>, rather than with the character U+0673 ARABIC LETTER ALEF WITH WAVY HAMZA BELOW, which has been deprecated and which is not canonically equivalent. "Section 9.2: Arabic, Additional Vowel Marks" (PDF). The Unicode Standard. The Unicode Consortium. September 2022.

Share this article:

This article uses material from the Wikipedia article Arabic_script_in_Unicode, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.