| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
| <html> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> |
| <title>Working with HarfBuzz clusters: HarfBuzz Manual</title> |
| <meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> |
| <link rel="home" href="index.html" title="HarfBuzz Manual"> |
| <link rel="up" href="clusters.html" title="Clusters"> |
| <link rel="prev" href="clusters.html" title="Clusters"> |
| <link rel="next" href="a-clustering-example-for-levels-0-and-1.html" title="A clustering example for levels 0 and 1"> |
| <meta name="generator" content="GTK-Doc V1.25 (XML mode)"> |
| <link rel="stylesheet" href="style.css" type="text/css"> |
| </head> |
| <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> |
| <table class="navigation" id="top" width="100%" summary="Navigation header" cellpadding="2" cellspacing="5"><tr valign="middle"> |
| <td width="100%" align="left" class="shortcuts"></td> |
| <td><a accesskey="h" href="index.html"><img src="home.png" width="16" height="16" border="0" alt="Home"></a></td> |
| <td><a accesskey="u" href="clusters.html"><img src="up.png" width="16" height="16" border="0" alt="Up"></a></td> |
| <td><a accesskey="p" href="clusters.html"><img src="left.png" width="16" height="16" border="0" alt="Prev"></a></td> |
| <td><a accesskey="n" href="a-clustering-example-for-levels-0-and-1.html"><img src="right.png" width="16" height="16" border="0" alt="Next"></a></td> |
| </tr></table> |
| <div class="section"> |
| <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| <a name="working-with-harfbuzz-clusters"></a>Working with HarfBuzz clusters</h2></div></div></div> |
| <p> |
| When you add text to a HarfBuzz buffer, each code point must be |
| assigned a <span class="emphasis"><em>cluster value</em></span>. |
| </p> |
| <p> |
| This cluster value is an arbitrary number; HarfBuzz uses it only |
| to distinguish between clusters. Many client programs will use |
| the index of each code point in the input text stream as the |
| cluster value. This is for the sake of convenience; the actual |
| value does not matter. |
| </p> |
| <p> |
| Some of the shaping operations performed by HarfBuzz — |
| such as reordering, composition, decomposition, and substitution |
| — may alter the cluster values of some characters. The |
| final cluster values in the buffer at the end of the shaping |
| process will indicate to client programs which subsequences of |
| glyphs represent a cluster and, therefore, must not be |
| separated. |
| </p> |
| <p> |
| In addition, client programs can query the final cluster values |
| to discern other potentially important information about the |
| glyphs in the output buffer (such as whether or not a ligature |
| was formed). |
| </p> |
| <p> |
| For example, if the initial sequence of cluster values was: |
| </p> |
| <pre class="programlisting"> |
| 0,1,2,3,4 |
| </pre> |
| <p> |
| and the final sequence of cluster values is: |
| </p> |
| <pre class="programlisting"> |
| 0,0,3,3 |
| </pre> |
| <p> |
| then there are two clusters in the output buffer: the first |
| cluster includes the first two glyphs, and the second cluster |
| includes the third and fourth glyphs. It is also evident that a |
| ligature or conjunct has been formed, because there are fewer |
| glyphs in the output buffer (four) than there were code points |
| in the input buffer (five). |
| </p> |
| <p> |
| Although client programs using HarfBuzz are free to assign |
| initial cluster values in any manner they choose to, HarfBuzz |
| does offer some useful guarantees if the cluster values are |
| assigned in a monotonic (either non-decreasing or non-increasing) |
| order. |
| </p> |
| <p> |
| For left-to-right scripts (LTR) and top-to-bottom scripts (TTB), |
| HarfBuzz will preserve the monotonic property: client programs |
| are guaranteed that monotonically increasing initial clulster |
| values will be returned as monotonically increasing final |
| cluster values. |
| </p> |
| <p> |
| For right-to-left scripts (RTL) and bottom-to-top scripts (BTT), |
| the directionality of the buffer itself is reversed for final |
| output as a matter of design. Therefore, HarfBuzz inverts the |
| monotonic property: client programs are guaranteed that |
| monotonically increasing initial clulster values will be |
| returned as monotonically <span class="emphasis"><em>decreasing</em></span> final |
| cluster values. |
| </p> |
| <p> |
| Client programs can adjust how HarfBuzz handles clusters during |
| shaping by setting the |
| <code class="literal">cluster_level</code> of the |
| buffer. HarfBuzz offers three <span class="emphasis"><em>levels</em></span> of |
| clustering support for this property: |
| </p> |
| <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| <li class="listitem"> |
| <p><span class="emphasis"><em>Level 0</em></span> is the default and |
| reproduces the behavior of the old HarfBuzz library. |
| </p> |
| <p> |
| The distinguishing feature of level 0 behavior is that, at |
| the beginning of processing the buffer, all code points that |
| are categorized as <span class="emphasis"><em>marks</em></span>, |
| <span class="emphasis"><em>modifier symbols</em></span>, or |
| <span class="emphasis"><em>Emoji extended pictographic</em></span> modifiers, |
| as well as the <span class="emphasis"><em>Zero Width Joiner</em></span> and |
| <span class="emphasis"><em>Zero Width Non-Joiner</em></span> code points, are |
| assigned the cluster value of the closest preceding code |
| point from <span class="emphasis"><em>different</em></span> category. |
| </p> |
| <p> |
| In essence, whenever a base character is followed by a mark |
| character or a sequence of mark characters, those marks are |
| reassigned to the same initial cluster value as the base |
| character. This reassignment is referred to as |
| "merging" the affected clusters. This behavior is based on |
| the Grapheme Cluster Boundary specification in <a class="ulink" href="https://www.unicode.org/reports/tr29/#Regex_Definitions" target="_top">Unicode |
| Technical Report 29</a>. |
| </p> |
| <p> |
| Client programs can specify level 0 behavior for a buffer by |
| setting its <code class="literal">cluster_level</code> to |
| <code class="literal">HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES</code>. |
| </p> |
| </li> |
| <li class="listitem"> |
| <p> |
| <span class="emphasis"><em>Level 1</em></span> tweaks the old behavior |
| slightly to produce better results. Therefore, level 1 |
| clustering is recommended for code that is not required to |
| implement backward compatibility with the old HarfBuzz. |
| </p> |
| <p> |
| Level 1 differs from level 0 by not merging the |
| clusters of marks and other modifier code points with the |
| preceding "base" code point's cluster. By preserving the |
| separate cluster values of these marks and modifier code |
| points, script shapers can perform additional operations |
| that might lead to improved results (for example, reordering |
| a sequence of marks). |
| </p> |
| <p> |
| Client programs can specify level 1 behavior for a buffer by |
| setting its <code class="literal">cluster_level</code> to |
| <code class="literal">HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS</code>. |
| </p> |
| </li> |
| <li class="listitem"> |
| <p> |
| <span class="emphasis"><em>Level 2</em></span> differs significantly in how it |
| treats cluster values. In level 2, HarfBuzz never merges |
| clusters. |
| </p> |
| <p> |
| This difference can be seen most clearly when HarfBuzz processes |
| ligature substitutions and glyph decompositions. In level 0 |
| and level 1, ligatures and glyph decomposition both involve |
| merging clusters; in level 2, neither of these operations |
| triggers a merge. |
| </p> |
| <p> |
| Client programs can specify level 2 behavior for a buffer by |
| setting its <code class="literal">cluster_level</code> to |
| <code class="literal">HB_BUFFER_CLUSTER_LEVEL_CHARACTERS</code>. |
| </p> |
| </li> |
| </ul></div> |
| <p> |
| As mentioned earlier, client programs using HarfBuzz often |
| assign initial cluster values in a buffer by reusing the indices |
| of the code points in the input text. This gives a sequence of |
| cluster values that is monotonically increasing (for example, |
| 0,1,2,3,4). |
| </p> |
| <p> |
| It is not <span class="emphasis"><em>required</em></span> that the cluster values |
| in a buffer be monotonically increasing. However, if the initial |
| cluster values in a buffer are monotonic and the buffer is |
| configured to use cluster level 0 or 1, then HarfBuzz |
| guarantees that the final cluster values in the shaped buffer |
| will also be monotonic. No such guarantee is made for cluster |
| level 2. |
| </p> |
| <p> |
| In levels 0 and 1, HarfBuzz implements the following conceptual |
| model for cluster values: |
| </p> |
| <div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: disc; "> |
| <li class="listitem"><p> |
| If the sequence of input cluster values is monotonic, the |
| sequence of cluster values will remain monotonic. |
| </p></li> |
| <li class="listitem"><p> |
| Each cluster value represents a single cluster. |
| </p></li> |
| <li class="listitem"><p> |
| Each cluster contains one or more glyphs and one or more |
| characters. |
| </p></li> |
| </ul></div> |
| <p> |
| In practice, this model offers several benefits. Assuming that |
| the initial cluster values were monotonically increasing |
| and distinct before shaping began, then, in the final output: |
| </p> |
| <div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: disc; "> |
| <li class="listitem"><p> |
| All adjacent glyphs having the same final cluster |
| value belong to the same cluster. |
| </p></li> |
| <li class="listitem"><p> |
| Each character belongs to the cluster that has the highest |
| cluster value <span class="emphasis"><em>not larger than</em></span> its |
| initial cluster value. |
| </p></li> |
| </ul></div> |
| </div> |
| <div class="footer"> |
| <hr>Generated by GTK-Doc V1.25</div> |
| </body> |
| </html> |