| <!-- ##### SECTION Title ##### --> |
| Character Set Conversion |
| |
| <!-- ##### SECTION Short_Description ##### --> |
| convert strings between different character sets using iconv() |
| |
| <!-- ##### SECTION Long_Description ##### --> |
| <para> |
| |
| </para> |
| |
| <refsect2 id="file-name-encodings"> |
| <title>File Name Encodings</title> |
| |
| <para> |
| Historically, Unix has not had a defined encoding for file |
| names: a file name is valid as long as it does not have path |
| separators in it ("/"). However, displaying file names may |
| require conversion: from the character set in which they were |
| created, to the character set in which the application |
| operates. Consider the Spanish file name |
| "<filename>Presentación.sxi</filename>". If the |
| application which created it uses ISO-8859-1 for its encoding, |
| then the actual file name on disk would look like this: |
| </para> |
| |
| <programlisting id="filename-iso8859-1"> |
| Character: P r e s e n t a c i ó n . s x i |
| Hex code: 50 72 65 73 65 6e 74 61 63 69 f3 6e 2e 73 78 69 |
| </programlisting> |
| |
| <para> |
| However, if the application use UTF-8, the actual file name on |
| disk would look like this: |
| </para> |
| |
| <programlisting id="filename-utf-8"> |
| Character: P r e s e n t a c i ó n . s x i |
| Hex code: 50 72 65 73 65 6e 74 61 63 69 c3 b3 6e 2e 73 78 69 |
| </programlisting> |
| |
| <para> |
| Glib uses UTF-8 for its strings, and GUI toolkits like GTK+ |
| that use Glib do the same thing. If you get a file name from |
| the file system, for example, from |
| <function>readdir(3)</function> or from <link |
| linkend="g_dir_read_name"><function>g_dir_read_name()</function></link>, |
| and you wish to display the file name to the user, you |
| <emphasis>will</emphasis> need to convert it into UTF-8. The |
| opposite case is when the user types the name of a file he |
| wishes to save: the toolkit will give you that string in |
| UTF-8 encoding, and you will need to convert it to the |
| character set used for file names before you can create the |
| file with <function>open(2)</function> or |
| <function>fopen(3)</function>. |
| </para> |
| |
| <para> |
| By default, Glib assumes that file names on disk are in UTF-8 |
| encoding. This is a valid assumption for file systems which |
| were created relatively recently: most applications use UTF-8 |
| encoding for their strings, and that is also what they use for |
| the file names they create. However, older file systems may |
| still contain file names created in "older" encodings, such as |
| ISO-8859-1. In this case, for compatibility reasons, you may |
| want to instruct Glib to use that particular encoding for file |
| names rather than UTF-8. You can do this by specifying the |
| encoding for file names in the <link |
| linkend="G_FILENAME_ENCODING"><envar>G_FILENAME_ENCODING</envar></link> |
| environment variable. For example, if your installation uses |
| ISO-8859-1 for file names, you can put this in your |
| <filename>~/.profile</filename>: |
| </para> |
| |
| <programlisting> |
| export G_FILENAME_ENCODING=ISO-8859-1 |
| </programlisting> |
| |
| <para> |
| Glib provides the functions <link |
| linkend="g_filename_to_utf8"><function>g_filename_to_utf8()</function></link> |
| and <link |
| linkend="g_filename_from_utf8"><function>g_filename_from_utf8()</function></link> |
| to perform the necessary conversions. These functions convert |
| file names from the encoding specified in |
| <envar>G_FILENAME_ENCODING</envar> to UTF-8 and vice-versa. |
| <xref linkend="file-name-encodings-diagram"/> illustrates how |
| these functions are used to convert between UTF-8 and the |
| encoding for file names in the file system. |
| </para> |
| |
| <figure id="file-name-encodings-diagram"> |
| <title>Conversion between File Name Encodings</title> |
| <graphic fileref="file-name-encodings.png" format="PNG"/> |
| </figure> |
| |
| <refsect3 id="file-name-encodings-checklist"> |
| <title>Checklist for Application Writers</title> |
| |
| <para> |
| This section is a practical summary of the detailed |
| description above. You can use this as a checklist of |
| things to do to make sure your applications process file |
| name encodings correctly. |
| </para> |
| |
| <orderedlist> |
| <listitem> |
| <para> |
| If you get a file name from the file system from a |
| function such as <function>readdir(3)</function> or |
| <function>gtk_file_chooser_get_filename()</function>, |
| you do not need to do any conversion to pass that |
| file name to functions like <function>open(2)</function>, |
| <function>rename(2)</function>, or |
| <function>fopen(3)</function> — those are "raw" |
| file names which the file system understands. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| If you need to display a file name, convert it to UTF-8 |
| first by using <link |
| linkend="g_filename_to_utf8"><function>g_filename_to_utf8()</function></link>. |
| If conversion fails, display a string like |
| "<literal>Unknown file name</literal>". <emphasis>Do |
| not</emphasis> convert this string back into the |
| encoding used for file names if you wish to pass it to |
| the file system; use the original file name instead. |
| For example, the document window of a word processor |
| could display "Unknown file name" in its title bar but |
| still let the user save the file, as it would keep the |
| raw file name internally. This can happen if the user |
| has not set the <envar>G_FILENAME_ENCODING</envar> |
| environment variable even though he has files whose |
| names are not encoded in UTF-8. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| If your user interface lets the user type a file name |
| for saving or renaming, convert it to the encoding used |
| for file names in the file system by using <link |
| linkend="g_filename_from_utf8"><function>g_filename_from_utf8()</function></link>. |
| Pass the converted file name to functions like |
| <function>fopen(3)</function>. If conversion fails, ask |
| the user to enter a different file name. This can |
| happen if the user types Japanese characters when |
| <envar>G_FILENAME_ENCODING</envar> is set to |
| <literal>ISO-8859-1</literal>, for example. |
| </para> |
| </listitem> |
| </orderedlist> |
| </refsect3> |
| </refsect2> |
| |
| <!-- ##### SECTION See_Also ##### --> |
| <para> |
| |
| </para> |
| |
| <!-- ##### SECTION Stability_Level ##### --> |
| |
| |
| <!-- ##### SECTION Image ##### --> |
| |
| |
| <!-- ##### FUNCTION g_convert ##### --> |
| <para> |
| |
| </para> |
| |
| @str: |
| @len: |
| @to_codeset: |
| @from_codeset: |
| @bytes_read: |
| @bytes_written: |
| @error: |
| @Returns: |
| |
| |
| <!-- ##### FUNCTION g_convert_with_fallback ##### --> |
| <para> |
| |
| </para> |
| |
| @str: |
| @len: |
| @to_codeset: |
| @from_codeset: |
| @fallback: |
| @bytes_read: |
| @bytes_written: |
| @error: |
| @Returns: |
| |
| |
| <!-- ##### STRUCT GIConv ##### --> |
| <para> |
| The <structname>GIConv</structname> struct wraps an |
| <function>iconv()</function> conversion descriptor. It contains private data |
| and should only be accessed using the following functions. |
| </para> |
| |
| |
| <!-- ##### FUNCTION g_convert_with_iconv ##### --> |
| <para> |
| |
| </para> |
| |
| @str: |
| @len: |
| @converter: |
| @bytes_read: |
| @bytes_written: |
| @error: |
| @Returns: |
| |
| |
| <!-- ##### MACRO G_CONVERT_ERROR ##### --> |
| <para> |
| Error domain for character set conversions. Errors in this domain will |
| be from the #GConvertError enumeration. See #GError for information on |
| error domains. |
| </para> |
| |
| |
| |
| <!-- ##### FUNCTION g_iconv_open ##### --> |
| <para> |
| |
| </para> |
| |
| @to_codeset: |
| @from_codeset: |
| @Returns: |
| |
| |
| <!-- ##### FUNCTION g_iconv ##### --> |
| <para> |
| |
| </para> |
| |
| @converter: |
| @inbuf: |
| @inbytes_left: |
| @outbuf: |
| @outbytes_left: |
| @Returns: |
| |
| |
| <!-- ##### FUNCTION g_iconv_close ##### --> |
| <para> |
| |
| </para> |
| |
| @converter: |
| @Returns: |
| |
| |
| <!-- ##### FUNCTION g_locale_to_utf8 ##### --> |
| <para> |
| |
| </para> |
| |
| @opsysstring: |
| @len: |
| @bytes_read: |
| @bytes_written: |
| @error: |
| @Returns: |
| |
| |
| <!-- ##### FUNCTION g_filename_to_utf8 ##### --> |
| <para> |
| |
| </para> |
| |
| @opsysstring: |
| @len: |
| @bytes_read: |
| @bytes_written: |
| @error: |
| @Returns: |
| |
| |
| <!-- ##### FUNCTION g_filename_from_utf8 ##### --> |
| <para> |
| |
| </para> |
| |
| @utf8string: |
| @len: |
| @bytes_read: |
| @bytes_written: |
| @error: |
| @Returns: |
| |
| |
| <!-- ##### FUNCTION g_filename_from_uri ##### --> |
| <para> |
| |
| </para> |
| |
| @uri: |
| @hostname: |
| @error: |
| @Returns: |
| |
| |
| <!-- ##### FUNCTION g_filename_to_uri ##### --> |
| <para> |
| |
| </para> |
| |
| @filename: |
| @hostname: |
| @error: |
| @Returns: |
| |
| |
| <!-- ##### FUNCTION g_get_filename_charsets ##### --> |
| <para> |
| |
| </para> |
| |
| @charsets: |
| @Returns: |
| |
| |
| <!-- ##### FUNCTION g_filename_display_name ##### --> |
| <para> |
| |
| </para> |
| |
| @filename: |
| @Returns: |
| |
| |
| <!-- ##### FUNCTION g_filename_display_basename ##### --> |
| <para> |
| |
| </para> |
| |
| @filename: |
| @Returns: |
| |
| |
| <!-- ##### FUNCTION g_locale_from_utf8 ##### --> |
| <para> |
| |
| </para> |
| |
| @utf8string: |
| @len: |
| @bytes_read: |
| @bytes_written: |
| @error: |
| @Returns: |
| |
| |
| <!-- ##### ENUM GConvertError ##### --> |
| <para> |
| Error codes returned by character set conversion routines. |
| </para> |
| |
| @G_CONVERT_ERROR_NO_CONVERSION: Conversion between the requested character sets |
| is not supported. |
| @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input. |
| @G_CONVERT_ERROR_FAILED: Conversion failed for some reason. |
| @G_CONVERT_ERROR_PARTIAL_INPUT: Partial character sequence at end of input. |
| @G_CONVERT_ERROR_BAD_URI: URI is invalid. |
| @G_CONVERT_ERROR_NOT_ABSOLUTE_PATH: Pathname is not an absolute path. |
| |
| <!-- ##### FUNCTION g_get_charset ##### --> |
| <para> |
| |
| </para> |
| |
| @charset: |
| @Returns: |
| |
| |
| <!-- |
| Local variables: |
| mode: sgml |
| sgml-parent-document: ("../glib-docs.sgml" "book" "refentry") |
| End: |
| --> |
| |
| |