| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
| <html> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> |
| <title>Unicode character categories: HarfBuzz Manual</title> |
| <meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> |
| <link rel="home" href="index.html" title="HarfBuzz Manual"> |
| <link rel="up" href="shaping-concepts.html" title="Shaping concepts"> |
| <link rel="prev" href="shaping-operations.html" title="Shaping operations"> |
| <link rel="next" href="text-runs.html" title="Text runs"> |
| <meta name="generator" content="GTK-Doc V1.25 (XML mode)"> |
| <link rel="stylesheet" href="style.css" type="text/css"> |
| </head> |
| <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> |
| <table class="navigation" id="top" width="100%" summary="Navigation header" cellpadding="2" cellspacing="5"><tr valign="middle"> |
| <td width="100%" align="left" class="shortcuts"></td> |
| <td><a accesskey="h" href="index.html"><img src="home.png" width="16" height="16" border="0" alt="Home"></a></td> |
| <td><a accesskey="u" href="shaping-concepts.html"><img src="up.png" width="16" height="16" border="0" alt="Up"></a></td> |
| <td><a accesskey="p" href="shaping-operations.html"><img src="left.png" width="16" height="16" border="0" alt="Prev"></a></td> |
| <td><a accesskey="n" href="text-runs.html"><img src="right.png" width="16" height="16" border="0" alt="Next"></a></td> |
| </tr></table> |
| <div class="section"> |
| <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| <a name="unicode-character-categories"></a>Unicode character categories</h2></div></div></div> |
| <p> |
| Shaping models are typically specified with respect to how |
| scripts are defined in the Unicode standard. |
| </p> |
| <p> |
| Every codepoint in the Unicode Character Database (UCD) is |
| assigned a <span class="emphasis"><em>Unicode General Category</em></span> (UGC), |
| which provides the most fundamental information about the |
| codepoint: whether the codepoint represents a |
| <span class="emphasis"><em>Letter</em></span>, a <span class="emphasis"><em>Mark</em></span>, a |
| <span class="emphasis"><em>Number</em></span>, <span class="emphasis"><em>Punctuation</em></span>, a |
| <span class="emphasis"><em>Symbol</em></span>, a <span class="emphasis"><em>Separator</em></span>, |
| or something else (<span class="emphasis"><em>Other</em></span>). |
| </p> |
| <p> |
| These UGC properties are "Major" categories. Each codepoint is |
| further assigned to a "minor" category within its Major |
| category, such as "Letter, uppercase" (<code class="literal">Lu</code>) or |
| "Letter, modifier" (<code class="literal">Lm</code>). |
| </p> |
| <p> |
| Shaping models are concerned primarily with Letter and Mark |
| codepoints. The minor categories of Mark codepoints are |
| particularly important for shaping. Marks can be nonspacing |
| (<code class="literal">Mn</code>), spacing combining |
| (<code class="literal">Mc</code>), or enclosing (<code class="literal">Me</code>). |
| </p> |
| <p> |
| In addition to the UGC property, codepoints in the Indic and |
| Southeast Asian scripts are also assigned |
| <span class="emphasis"><em>Unicode Indic Syllabic Category</em></span> (UISC) and |
| <span class="emphasis"><em>Unicode Indic Positional Category</em></span> (UIPC) |
| properties that provide more detailed information needed for |
| shaping. |
| </p> |
| <p> |
| The UISC property sub-categorizes Letters and Marks according to |
| common script-shaping behaviors. For example, UISC distinguishes |
| between consonant letters, vowel letters, and vowel marks. The |
| UIPC property sub-categorizes Mark codepoints by the relative visual |
| position that they occupy (above, below, right, left, or in |
| multiple positions). |
| </p> |
| <p> |
| Some complex scripts require that the text run be split into |
| syllables. What constitutes a valid syllable in these |
| scripts is specified in regular expressions, formed from the |
| Letter and Mark codepoints, that take the UISC and UIPC |
| properties into account. |
| </p> |
| </div> |
| <div class="footer"> |
| <hr>Generated by GTK-Doc V1.25</div> |
| </body> |
| </html> |