Name: icu
URL: http://site.icu-project.org/
Version: 56.1
License: MIT
Security Critical: yes

Description:
This directory contains the source code of ICU 56.1 for C/C++.

A. How to update ICU

1. Run "scripts/update.sh <version>" (e.g. 56-1).
   This will download ICU from the upstream svn repository.
   It does preserve Fuchsia-specific build files (*local.mk) and
   converter files. (see section C)

2. Update the source file lists for i18n and common
   in icu.gypi and BUILD.gn. See the comments in the files.

3. Review and apply patches/changes in "D. Local Modifications" if
   necessary/applicable. Update patch files in patches/.

4. Follow the instructions in section B on building ICU data files


B. How to build ICU data files


Pre-built data files are generated and checked in with the following steps.
These steps should be run on a Linux host.

1. icu data files

  a. Make a icu data build directory outside the Fuchsia source tree
     and cd to that directory (say, $ICUBUILDIR).

  b. Run

    ${FUCHSIA_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout

  c. Run make

     'make' will fail  when pkgdata looks for css3transform.res. This
     is expected. To work around this, run this after you see the error:

     sed -i 's/css3transform.res/root.res/' data/out/tmp/icudata.lst
     make

     See http://bugs.icu-project.org/trac/ticket/10570

  d. Run
       ${FUCHSIA_ICU_TREE_TOP}/scripts/trim_data.sh

    The full locale data for Fuchsia's UI languages and their select variants
    and the bare minimum locale data for other locales will be kept.

  e. Run
       ${FUCHSIA_ICU_TREE_TOP}/scripts/make_data.sh

     This makes icudt${version}l.dat.

  f. Run
       ${FUCHSIA_ICU_TREE_TOP}/scripts/copy_data.sh

     This copies the ICU data files (both Little and Big Endian)
     to the following locations:

     common/icudtl.dat
     common/icudtb.dat

  g. Run
       ${FUCHSIA_ICU_TREE_TOP}/scripts/clean_up_data_source.sh

     This reverts the result of trim_data.sh and patch_locale.sh and
     make the tree ready for committing updated ICU data files.

  k. Whenever data is updated (e.g timezone update), follow d ~ g as long
  as the ICU build directory used in a ~ c is kept.

2. Note on the locale data customization

  - scripts/trim_data.sh
      a. Trim the locale data for Fuchsia's UI langauges :
         locales, lang, region, currency, zone
      b. Trim the locale data for non-UI languages to the bare minimum :
        ExemplarCharacters, LocaleScript, layout, and the name of the
        language for a locale in its native language.
      c. Remove the legacy Chinese character set-based collation
         (big5han/gb2312han) that don't make any sense and nobdoy uses.

C. Fuchsia-specific data build files and converters

They're preserved in step A.1 above. In general, there's no need to touch
them when updating ICU.

1. source/data/mappings
  - convrtrs.txt : Lists encodings and aliases required by the WHATWG
    Encoding spec plus a few extra (see the file as to why).

  - ucmlocal.txt : to list only converters we need.

  - *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP,
    Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
    They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh.

  - gb18030.ucm and windows-936.ucm
    gb_table.patch was applied for the following changes.
    a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per
    the encoding spec (one-way mapping in toUnicode direction).
    b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
    from U+1E3F to \xA8\xBC (windows-936/GBK).
       See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3

2. source/data/*/*local.mk
  - List locales of interest to Fuchsia
   a. Fuchsia's UI languages
   b. Variants of UI languages
   c. Other locales in Accept-Language list : will only have bare minimum
   locale data

  - brklocal.mk drops all *loose.brk to save space ( ~370kB) for now.

3. source/data/brkitr
  - khmerdict.txt: Abridged Khmer dictionary. See
    http://bugs.icu-project.org/trac/ticket/9451
  - word_ja.txt (used only on Android)
    Added for Japanese-specific word-breaking without the C+J dictionary.

4. source/data/trnslit/css3transform.txt
  - Handle Greek case conversion with a transliterator

5. Add {an,ast,ckb,ku,tg,wa}.txt to source/data/{locale,lang}
   with the minimal locale data necessary for spellchecker and
   and language menus. Also change the English display name
   for ckb to 'Kurdish (Arabic)'.

D. Local Modifications

1. Applied locale data patches from Google obtained by diff'ing
   the upstream copy and Google's internal copy for source/data

  - patches/locale_google.patch:
    * Google's internal ICU locale changes
    * Simpler region names for Hong Kong and Macau in all locales
    * Currency signs in ru and uk locales (do not include 'tr' locale changes)
    * AM/PM, midnight, noon formatting for a few Indian locales
    * Timezone name changes in Korean and Chinese locales

  - patches/locale1.patch: Minor fixes for Korean


2. Applied post-56 fixes from the upstream for measure/date format bugs

  - patches/measure_format.patch: combined patch of 12 CLs taken
    from bugs below.
  - upstream bugs
    http://bugs.icu-project.org/trac/ticket/11986
    http://bugs.icu-project.org/trac/ticket/12031
    http://bugs.icu-project.org/trac/ticket/12030
    http://bugs.icu-project.org/trac/ticket/12041

  - patches/relative_date.patch from Android
    https://android.googlesource.com/platform/external/icu/+/f9ffd5b%5E%21

3. Breakiterator patches
  - patches/linebrk.patch
    a. Drop *_loose.txt for all locales and use the corresponding normal.txt
    b. Drop local patches we used to have for the following issues. They'll
       be dealt with in the upstream (Unicode/CLDR).
       http://unicode.org/cldr/trac/ticket/6557
       http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779)

  - patches/wordbrk.patch for word.txt
    a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
       FQDN labels can be split at '.'
    b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
       See http://unicode.org/cldr/trac/ticket/6555

  - patches/khmer-dictbe.patch
    Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt).
    http://bugs.icu-project.org/trac/ticket/9451

  - Add several common Chinese words that were dropped previously to
    source/data/cjdict/brkitr/cjdict.txt
    patch: patches/cjdict.patch
    upstream bug: http://bugs.icu-project.org/trac/ticket/10888

4. Timezone data update
  Run scripts/update_tz.sh to grab the latest version of the
  following timezone data files and put them in source/data/misc

     metaZones.txt
     timezoneTypes.txt
     windowsZones.txt
     zoneinfo64.txt

  As of July 26, 2016, the latest version is 2016f and the above files
  are available at
  http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2016f/44/

5. Build-related changes

  - patches/wpo.patch
    upstream bugs : http://bugs.icu-project.org/trac/ticket/8043
                    http://bugs.icu-project.org/trac/ticket/5701

  - patches/data.build.patch :
      Remove unnecessary resources : unames, collator rule source
  - patches/data_symb.patch :
      Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
      the icu data file or icudt.dll

6. Apply a timezone detection API fix
  - patches/tzdetect.patch
  - upstream bugs
    http://bugs.icu-project.org/trac/ticket/11623

7. Fix 'bad cast' found in Transliterator with a cfi build
  - patches/xlit_badcast.patch
  - upstream bug (fixed in the upstream. Will be in ICU 57 release)
    http://bugs.icu-project.org/trac/ticket/11937

8. Add back UTF-32 converters temporarily even when
   UCONFIG_ONLY_HTML_CONVERSION is defined until UTF-32 is
   removed from Blink. See
   http://www.icu-project.org/trac/ticket/11296 and
   http://crbug.com/417850

   - patches/utf32.patch

9. Fix a UText bug found in uregex_open fuzzer.
  - patches/utext.patch
  - upstream bug (fixed in trunk in Jan, 2016. Will be in ICU 57 release)
    http://bugs.icu-project.org/trac/ticket/12130

10. Fix a bug in regex compiler.
  - patches/regexcmp.patch
  - upstream bug (fixed in the upstream. Will be in ICU 57 release)
    http://bugs.icu-project.org/trac/ticket/12138

11. Remove an unnecessary static initializer
  - patches/remove_si.patch
  - upstream bug (fixed in trunk. Will be in ICU 57 release)
    http://bugs.icu-project.org/trac/ticket/12408

12. Cherry pick locale data fixes from the upstream and Android
  - patches/locale_extra.patch
  - upstream bugs
        http://unicode.org/cldr/trac/ticket/9045 (en-AU date format)
        http://unicode.org/cldr/trac/ticket/7969 (percent sign in ar and fa)
  - Android patch for the 2nd bug
        https://android.googlesource.com/platform/external/icu/+/56b2b8b

13. Add Emoji properties support by cherry-picking from 57.1
  - patches/emoji_props.patch
  - Upstream change cherry-picked
        http://bugs.icu-project.org/trac/changeset/38183
  - source/data/in/{pnames,uprops}.icu were copied from the upstream,
    but they're just for the record. Their contents are hard-coded
    in the source files patched by the above patch.
