third_party/rust_crates/vendor/harfbuzz-sys/harfbuzz/docs/html/unicode-character-categories.html - fuchsia/ - Git at Google

 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
 <html>
 <head>
 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
 <title>Unicode character categories: HarfBuzz Manual</title>
 <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
 <link rel="home" href="index.html" title="HarfBuzz Manual">
 <link rel="up" href="shaping-concepts.html" title="Shaping concepts">
 <link rel="prev" href="shaping-operations.html" title="Shaping operations">
 <link rel="next" href="text-runs.html" title="Text runs">
 <meta name="generator" content="GTK-Doc V1.25 (XML mode)">
 <link rel="stylesheet" href="style.css" type="text/css">
 </head>
 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
 <table class="navigation" id="top" width="100%" summary="Navigation header" cellpadding="2" cellspacing="5"><tr valign="middle">
 <td width="100%" align="left" class="shortcuts"></td>
 <td><a accesskey="h" href="index.html"><img src="home.png" width="16" height="16" border="0" alt="Home"></a></td>
 <td><a accesskey="u" href="shaping-concepts.html"><img src="up.png" width="16" height="16" border="0" alt="Up"></a></td>
 <td><a accesskey="p" href="shaping-operations.html"><img src="left.png" width="16" height="16" border="0" alt="Prev"></a></td>
 <td><a accesskey="n" href="text-runs.html"><img src="right.png" width="16" height="16" border="0" alt="Next"></a></td>
 </tr></table>
 <div class="section">
 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
 <a name="unicode-character-categories"></a>Unicode character categories</h2></div></div></div>
 <p>
       Shaping models are typically specified with respect to how
       scripts are defined in the Unicode standard.
     </p>
 <p>
       Every codepoint in the Unicode Character Database (UCD) is
       assigned a <span class="emphasis"><em>Unicode General Category</em></span> (UGC),
       which provides the most fundamental information about the
       codepoint: whether the codepoint represents a
       <span class="emphasis"><em>Letter</em></span>, a <span class="emphasis"><em>Mark</em></span>, a
       <span class="emphasis"><em>Number</em></span>, <span class="emphasis"><em>Punctuation</em></span>, a
       <span class="emphasis"><em>Symbol</em></span>, a <span class="emphasis"><em>Separator</em></span>,
       or something else (<span class="emphasis"><em>Other</em></span>).
     </p>
 <p>
       These UGC properties are "Major" categories. Each codepoint is
       further assigned to a "minor" category within its Major
       category, such as "Letter, uppercase" (<code class="literal">Lu</code>) or
       "Letter, modifier" (<code class="literal">Lm</code>).
     </p>
 <p>
       Shaping models are concerned primarily with Letter and Mark
       codepoints. The minor categories of Mark codepoints are
       particularly important for shaping. Marks can be nonspacing
       (<code class="literal">Mn</code>), spacing combining
       (<code class="literal">Mc</code>), or enclosing (<code class="literal">Me</code>).
     </p>
 <p>
       In addition to the UGC property, codepoints in the Indic and
       Southeast Asian scripts are also assigned
       <span class="emphasis"><em>Unicode Indic Syllabic Category</em></span> (UISC) and
       <span class="emphasis"><em>Unicode Indic Positional Category</em></span> (UIPC)
       properties that provide more detailed information needed for
       shaping.
     </p>
 <p>
       The UISC property sub-categorizes Letters and Marks according to
       common script-shaping behaviors. For example, UISC distinguishes
       between consonant letters, vowel letters, and vowel marks. The
       UIPC property sub-categorizes Mark codepoints by the relative visual
       position that they occupy (above, below, right, left, or in
       multiple positions).
     </p>
 <p>
       Some complex scripts require that the text run be split into
       syllables. What constitutes a valid syllable in these
       scripts is specified in regular expressions, formed from the
       Letter and Mark codepoints, that take the UISC and UIPC
       properties into account.
     </p>
 </div>
 <div class="footer">
 <hr>Generated by GTK-Doc V1.25</div>
 </body>
 </html>
	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
	<html>
	<head>
	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
	<title>Unicode character categories: HarfBuzz Manual</title>
	<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
	<link rel="home" href="index.html" title="HarfBuzz Manual">
	<link rel="up" href="shaping-concepts.html" title="Shaping concepts">
	<link rel="prev" href="shaping-operations.html" title="Shaping operations">
	<link rel="next" href="text-runs.html" title="Text runs">
	<meta name="generator" content="GTK-Doc V1.25 (XML mode)">
	<link rel="stylesheet" href="style.css" type="text/css">
	</head>
	<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
	<table class="navigation" id="top" width="100%" summary="Navigation header" cellpadding="2" cellspacing="5"><tr valign="middle">
	<td width="100%" align="left" class="shortcuts"></td>
	<td><a accesskey="h" href="index.html"><img src="home.png" width="16" height="16" border="0" alt="Home"></a></td>
	<td><a accesskey="u" href="shaping-concepts.html"><img src="up.png" width="16" height="16" border="0" alt="Up"></a></td>
	<td><a accesskey="p" href="shaping-operations.html"><img src="left.png" width="16" height="16" border="0" alt="Prev"></a></td>
	<td><a accesskey="n" href="text-runs.html"><img src="right.png" width="16" height="16" border="0" alt="Next"></a></td>
	</tr></table>
	<div class="section">
	<div class="titlepage"><div><div><h2 class="title" style="clear: both">
	<a name="unicode-character-categories"></a>Unicode character categories</h2></div></div></div>
	<p>
	Shaping models are typically specified with respect to how
	scripts are defined in the Unicode standard.
	</p>
	<p>
	Every codepoint in the Unicode Character Database (UCD) is
	assigned a <span class="emphasis"><em>Unicode General Category</em></span> (UGC),
	which provides the most fundamental information about the
	codepoint: whether the codepoint represents a
	<span class="emphasis"><em>Letter</em></span>, a <span class="emphasis"><em>Mark</em></span>, a
	<span class="emphasis"><em>Number</em></span>, <span class="emphasis"><em>Punctuation</em></span>, a
	<span class="emphasis"><em>Symbol</em></span>, a <span class="emphasis"><em>Separator</em></span>,
	or something else (<span class="emphasis"><em>Other</em></span>).
	</p>
	<p>
	These UGC properties are "Major" categories. Each codepoint is
	further assigned to a "minor" category within its Major
	category, such as "Letter, uppercase" (<code class="literal">Lu</code>) or
	"Letter, modifier" (<code class="literal">Lm</code>).
	</p>
	<p>
	Shaping models are concerned primarily with Letter and Mark
	codepoints. The minor categories of Mark codepoints are
	particularly important for shaping. Marks can be nonspacing
	(<code class="literal">Mn</code>), spacing combining
	(<code class="literal">Mc</code>), or enclosing (<code class="literal">Me</code>).
	</p>
	<p>
	In addition to the UGC property, codepoints in the Indic and
	Southeast Asian scripts are also assigned
	<span class="emphasis"><em>Unicode Indic Syllabic Category</em></span> (UISC) and
	<span class="emphasis"><em>Unicode Indic Positional Category</em></span> (UIPC)
	properties that provide more detailed information needed for
	shaping.
	</p>
	<p>
	The UISC property sub-categorizes Letters and Marks according to
	common script-shaping behaviors. For example, UISC distinguishes
	between consonant letters, vowel letters, and vowel marks. The
	UIPC property sub-categorizes Mark codepoints by the relative visual
	position that they occupy (above, below, right, left, or in
	multiple positions).
	</p>
	<p>
	Some complex scripts require that the text run be split into
	syllables. What constitutes a valid syllable in these
	scripts is specified in regular expressions, formed from the
	Letter and Mark codepoints, that take the UISC and UIPC
	properties into account.
	</p>
	</div>
	<div class="footer">
	<hr>Generated by GTK-Doc V1.25</div>
	</body>
	</html>