| <chapter> |
| <title>Introduction</title> |
| |
| <para> |
| GObject, and its lower-level type system, GType, are used by GTK+ and most Gnome libraries to |
| provide: |
| <itemizedlist> |
| <listitem><para>object-oriented C-based APIs and</para></listitem> |
| <listitem><para>automatic transparent API bindings to other compiled |
| or interpreted languages.</para></listitem> |
| </itemizedlist> |
| </para> |
| |
| <para>A lot of programmers are used to work with compiled-only or dynamically interpreted-only |
| languages and do not understand the challenges associated with cross-language interoperability. |
| This introduction tries to provide an insight into these challenges and describes briefly |
| the solution choosen by GLib. |
| </para> |
| |
| <sect1> |
| <title>Data types and programming</title> |
| |
| <para> |
| One could say (I have seen such definitions used in some textbooks on programming language theory) |
| that a programming language is merely a way to create data types and manipulate them. Most languages |
| provide a number of language-native types and a few primitives to create more complex types based |
| on these primitive types. |
| </para> |
| |
| <para> |
| In C, the language provides types such as <emphasis>char</emphasis>, <emphasis>long</emphasis>, |
| <emphasis>pointer</emphasis>. During compilation of C code, the compiler maps these |
| language types to the compiler's target architecture machine types. If you are using a C interpreter |
| (I have never seen one myself but it is possible :), the interpreter (the program which interprets |
| the source code and executes it) maps the language types to the machine types of the target machine at |
| runtime, during the program execution (or just before execution if it uses a Just In Time compiler engine). |
| </para> |
| |
| <para>Perl and Python which are interpreted languages do not really provide type definitions similar |
| to those used by C. Perl and Python programmers manipulate variables and the type of the variables |
| is decided only upon the first assignment or upon the first use which forces a type on the variable. |
| The interpreter also often provides a lot of automatic conversions from one type to the other. For example, |
| in Perl, a variable which holds an integer can be automatically converted to a string given the |
| required context: |
| <programlisting> |
| my $tmp = 10; |
| print "this is an integer converted to a string:" . $tmp . "\n"; |
| </programlisting> |
| Of course, it is also often possible to explicitely specify conversions when the default conversions provided |
| by the language are not intuitive. |
| </para> |
| |
| </sect1> |
| |
| <sect1> |
| <title>Exporting a C API</title> |
| |
| <para>C APIs are defined by a set of functions and global variables which are usually exported from a |
| binary. C functions have an arbitrary number of arguments and one return value. Each function is thus |
| uniquely identified by the function name and the set of C types which describe the function arguments |
| and return value. The global variables exported by the API are similarly identified by their name and |
| their type. |
| </para> |
| |
| <para> |
| A C API is thus merely defined by a set of names to which a set of types are associated. If you know the |
| function calling convention and the mapping of the C types to the machine types used by the platform you |
| are on, you can resolve the name of each function to find where the code associated to this function |
| is located in memory, and then construct a valid argument list for the function. Finally, all you have to |
| do is triger a call to the target C function with the argument list. |
| </para> |
| |
| <para> |
| For the sake of discussion, here is a sample C function and the associated 32 bit x86 |
| assembly code generated by gcc on my linux box: |
| <programlisting> |
| static void function_foo (int foo) |
| {} |
| |
| int main (int argc, char *argv[]) |
| { |
| |
| function_foo (10); |
| |
| return 0; |
| } |
| |
| push $0xa |
| call 0x80482f4 <function_foo> |
| </programlisting> |
| The assembly code shown above is pretty straightforward: the first instruction pushes |
| the hexadecimal value 0xa (decimal value 10) as a 32 bit integer on the stack and calls |
| <function>function_foo</function>. As you can see, C function calls are implemented by |
| gcc by native function calls (this is probably the fastest implementation possible). |
| </para> |
| |
| <para> |
| Now, let's say we want to call the C function <function>function_foo</function> from |
| a python program. To do this, the python interpreter needs to: |
| <itemizedlist> |
| <listitem><para>Find where the function is located. This means probably find the binary generated by the C compiler |
| which exports this functions.</para></listitem> |
| <listitem><para>Load the code of the function in executable memory.</para></listitem> |
| <listitem><para>Convert the python parameters to C-compatible parameters before calling |
| the function.</para></listitem> |
| <listitem><para>Call the function with the right calling convention</para></listitem> |
| <listitem><para>Convert the return values of the C function to python-compatible |
| variables to return them to the python code.</para></listitem> |
| </itemizedlist> |
| </para> |
| |
| <para>The process described above is pretty complex and there are a lot of ways to make it entirely automatic |
| and transparent to the C and the Python programmers: |
| <itemizedlist> |
| <listitem><para>The first solution is to write by hand a lot of glue code, once for each function exported or imported, |
| which does the python to C parameter conversion and the C to python return value conversion. This glue code is then |
| linked with the interpreter which allows python programs to call a python functions which delegates the work to the |
| C function.</para></listitem> |
| <listitem><para>Another nicer solution is to automatically generate the glue code, once for each function exported or |
| imported, with a special compiler which |
| reads the original function signature.</para></listitem> |
| <listitem><para>The solution used by GLib is to use the GType library which holds at runtime a description of |
| all the objects manipulated by the programmer. This so-called <emphasis>dynamic type</emphasis><footnote> |
| <para> |
| There are numerous different implementations of dynamic type systems: all C++ |
| compilers have one, Java and .NET have one too. A dynamic type system allows you |
| to get information about every instantiated object at runtime. It can be implemented |
| by a process-specific database: every new object created registers the characteristics |
| of its associated type in the type system. It can also be implemented by introspection |
| interfaces. The common point between all these different type systems and implementations |
| is that they all allow you to query for object metadata at runtime. |
| </para> |
| </footnote> |
| |
| library is then |
| used by special generic glue code to automatically convert function parameters and function caling conventions |
| between different runtime domains.</para></listitem> |
| </itemizedlist> |
| The greatest advantage of the solution implemented by GType is that the glue code sitting at the runtime domain |
| boundaries is written once: the figure below states this more clearly. |
| <figure> |
| <mediaobject> |
| <imageobject> <!-- this is for HTML output --> |
| <imagedata fileref="glue.png" format="png" align="center"/> |
| </imageobject> |
| <imageobject> <!-- this is for PDF output --> |
| <imagedata fileref="glue.jpg" format="jpg" align="center"/> |
| </imageobject> |
| </mediaobject> |
| </figure> |
| |
| Currently, there exist at least Python and Perl glue code which makes it possible to use |
| C objects written with GType directly in Python or Perl, without any further work. |
| </para> |
| |
| |
| </sect1> |
| |
| |
| </chapter> |