docs/reference/gobject/tut_intro.xml - third_party/glib - Git at Google

 <chapter>
 <title>Introduction</title>

 <para>
 GObject, and its lower-level type system, GType, are used by GTK+ and most Gnome libraries to
 provide:
 <itemizedlist>
 <listitem><para>object-oriented C-based APIs and</para></listitem>
 <listitem><para>automatic transparent API bindings to other compiled
 or interpreted languages.</para></listitem>
 </itemizedlist>
 </para>

 <para>A lot of programmers are used to work with compiled-only or dynamically interpreted-only
 languages and do not understand the challenges associated with cross-language interoperability.
 This introduction tries to provide an insight into these challenges and describes briefly
 the solution choosen by GLib.
 </para>

 <sect1>
 <title>Data types and programming</title>

 <para>
 One could say (I have seen such definitions used in some textbooks on programming language theory)
 that a programming language is merely a way to create data types and manipulate them. Most languages
 provide a number of language-native types and a few primitives to create more complex types based
 on these primitive types.
 </para>

 <para>
 In C, the language provides types such as <emphasis>char</emphasis>, <emphasis>long</emphasis>,
 <emphasis>pointer</emphasis>. During compilation of C code, the compiler maps these
 language types to the compiler's target architecture machine types. If you are using a C interpreter
 (I have never seen one myself but it is possible :), the interpreter (the program which interprets
 the source code and executes it) maps the language types to the machine types of the target machine at
 runtime, during the program execution (or just before execution if it uses a Just In Time compiler engine).
 </para>

 <para>Perl and Python which are interpreted languages do not really provide type definitions similar
 to those used by C. Perl and Python programmers manipulate variables and the type of the variables
 is decided only upon the first assignment or upon the first use which forces a type on the variable.
 The interpreter also often provides a lot of automatic conversions from one type to the other. For example,
 in Perl, a variable which holds an integer can be automatically converted to a string given the
 required context:
 <programlisting>
 my $tmp = 10;
 print "this is an integer converted to a string:" . $tmp . "\n";
 </programlisting>
 Of course, it is also often possible to explicitely specify conversions when the default conversions provided
 by the language are not intuitive.
 </para>

 </sect1>

 <sect1>
 <title>Exporting a C API</title>

 <para>C APIs are defined by a set of functions and global variables which are usually exported from a
 binary. C functions have an arbitrary number of arguments and one return value. Each function is thus
 uniquely identified by the function name and the set of C types which describe the function arguments
 and return value. The global variables exported by the API are similarly identified by their name and
 their type.
 </para>

 <para>
 A C API is thus merely defined by a set of names to which a set of types are associated. If you know the
 function calling convention and the mapping of the C types to the machine types used by the platform you
 are on, you can resolve the name of each function to find where the code associated to this function
 is located in memory, and then construct a valid argument list for the function. Finally, all you have to
 do is triger a call to the target C function with the argument list.
 </para>

 <para>
 For the sake of discussion, here is a sample C function and the associated 32 bit x86
 assembly code generated by gcc on my linux box:
 <programlisting>
 static void function_foo (int foo)
 {}

 int main (int argc, char *argv[])
 {

         function_foo (10);

         return 0;
 }

 push   $0xa
 call   0x80482f4 &lt;function_foo>
 </programlisting>
 The assembly code shown above is pretty straightforward: the first instruction pushes
 the hexadecimal value 0xa (decimal value 10) as a 32 bit integer on the stack and calls
 <function>function_foo</function>. As you can see, C function calls are implemented by
 gcc by native function calls (this is probably the fastest implementation possible).
 </para>

 <para>
 Now, let's say we want to call the C function <function>function_foo</function> from
 a python program. To do this, the python interpreter needs to:
 <itemizedlist>
 <listitem><para>Find where the function is located. This means probably find the binary generated by the C compiler
 which exports this functions.</para></listitem>
 <listitem><para>Load the code of the function in executable memory.</para></listitem>
 <listitem><para>Convert the python parameters to C-compatible parameters before calling
 the function.</para></listitem>
 <listitem><para>Call the function with the right calling convention</para></listitem>
 <listitem><para>Convert the return values of the C function to python-compatible
 variables to return them to the python code.</para></listitem>
 </itemizedlist>
 </para>

 <para>The process described above is pretty complex and there are a lot of ways to make it entirely automatic
 and transparent to the C and the Python programmers:
 <itemizedlist>
 <listitem><para>The first solution is to write by hand a lot of glue code, once for each function exported or imported,
 which does the python to C parameter conversion and the C to python return value conversion. This glue code is then
 linked with the interpreter which allows python programs to call a python functions which delegates the work to the
 C function.</para></listitem>
 <listitem><para>Another nicer solution is to automatically generate the glue code, once for each function exported or
 imported, with a special compiler which
 reads the original function signature.</para></listitem>
 <listitem><para>The solution used by GLib is to use the GType library which holds at runtime a description of
 all the objects manipulated by the programmer. This so-called <emphasis>dynamic type</emphasis><footnote>
 <para>
 	There are numerous different implementations of dynamic type systems: all C++
 	compilers have one, Java and .NET have one too. A dynamic type system allows you
 	to get information about every instantiated object at runtime. It can be implemented
 	by a process-specific database: every new object created registers the characteristics
 	of its associated type in the type system. It can also be implemented by introspection
 	interfaces. The common point between all these different type systems and implementations
 	is that they all allow you to query for object metadata at runtime.
 </para>
 </footnote>

  library is then
 used by special generic glue code to automatically convert function parameters and function caling conventions
 between different runtime domains.</para></listitem>
 </itemizedlist>
 The greatest advantage of the solution implemented by GType is that the glue code sitting at the runtime domain
 boundaries is written once: the figure below states this more clearly.
 <figure>
   <mediaobject>
     <imageobject> <!-- this is for HTML output -->
       <imagedata fileref="glue.png" format="png" align="center"/>
     </imageobject>
     <imageobject> <!-- this is for PDF output -->
       <imagedata fileref="glue.jpg" format="jpg" align="center"/>
     </imageobject>
   </mediaobject>
 </figure>

 Currently, there exist at least Python and Perl glue code which makes it possible to use
 C objects written with GType directly in Python or Perl, without any further work.
 </para>


 </sect1>


 </chapter>
	<chapter>
	<title>Introduction</title>

	<para>
	GObject, and its lower-level type system, GType, are used by GTK+ and most Gnome libraries to
	provide:
	<itemizedlist>
	<listitem><para>object-oriented C-based APIs and</para></listitem>
	<listitem><para>automatic transparent API bindings to other compiled
	or interpreted languages.</para></listitem>
	</itemizedlist>
	</para>

	<para>A lot of programmers are used to work with compiled-only or dynamically interpreted-only
	languages and do not understand the challenges associated with cross-language interoperability.
	This introduction tries to provide an insight into these challenges and describes briefly
	the solution choosen by GLib.
	</para>

	<sect1>
	<title>Data types and programming</title>

	<para>
	One could say (I have seen such definitions used in some textbooks on programming language theory)
	that a programming language is merely a way to create data types and manipulate them. Most languages
	provide a number of language-native types and a few primitives to create more complex types based
	on these primitive types.
	</para>

	<para>
	In C, the language provides types such as <emphasis>char</emphasis>, <emphasis>long</emphasis>,
	<emphasis>pointer</emphasis>. During compilation of C code, the compiler maps these
	language types to the compiler's target architecture machine types. If you are using a C interpreter
	(I have never seen one myself but it is possible :), the interpreter (the program which interprets
	the source code and executes it) maps the language types to the machine types of the target machine at
	runtime, during the program execution (or just before execution if it uses a Just In Time compiler engine).
	</para>

	<para>Perl and Python which are interpreted languages do not really provide type definitions similar
	to those used by C. Perl and Python programmers manipulate variables and the type of the variables
	is decided only upon the first assignment or upon the first use which forces a type on the variable.
	The interpreter also often provides a lot of automatic conversions from one type to the other. For example,
	in Perl, a variable which holds an integer can be automatically converted to a string given the
	required context:
	<programlisting>
	my $tmp = 10;
	print "this is an integer converted to a string:" . $tmp . "\n";
	</programlisting>
	Of course, it is also often possible to explicitely specify conversions when the default conversions provided
	by the language are not intuitive.
	</para>

	</sect1>

	<sect1>
	<title>Exporting a C API</title>

	<para>C APIs are defined by a set of functions and global variables which are usually exported from a
	binary. C functions have an arbitrary number of arguments and one return value. Each function is thus
	uniquely identified by the function name and the set of C types which describe the function arguments
	and return value. The global variables exported by the API are similarly identified by their name and
	their type.
	</para>

	<para>
	A C API is thus merely defined by a set of names to which a set of types are associated. If you know the
	function calling convention and the mapping of the C types to the machine types used by the platform you
	are on, you can resolve the name of each function to find where the code associated to this function
	is located in memory, and then construct a valid argument list for the function. Finally, all you have to
	do is triger a call to the target C function with the argument list.
	</para>

	<para>
	For the sake of discussion, here is a sample C function and the associated 32 bit x86
	assembly code generated by gcc on my linux box:
	<programlisting>
	static void function_foo (int foo)
	{}

	int main (int argc, char *argv[])
	{

	function_foo (10);

	return 0;
	}

	push $0xa
	call 0x80482f4 <function_foo>
	</programlisting>
	The assembly code shown above is pretty straightforward: the first instruction pushes
	the hexadecimal value 0xa (decimal value 10) as a 32 bit integer on the stack and calls
	<function>function_foo</function>. As you can see, C function calls are implemented by
	gcc by native function calls (this is probably the fastest implementation possible).
	</para>

	<para>
	Now, let's say we want to call the C function <function>function_foo</function> from
	a python program. To do this, the python interpreter needs to:
	<itemizedlist>
	<listitem><para>Find where the function is located. This means probably find the binary generated by the C compiler
	which exports this functions.</para></listitem>
	<listitem><para>Load the code of the function in executable memory.</para></listitem>
	<listitem><para>Convert the python parameters to C-compatible parameters before calling
	the function.</para></listitem>
	<listitem><para>Call the function with the right calling convention</para></listitem>
	<listitem><para>Convert the return values of the C function to python-compatible
	variables to return them to the python code.</para></listitem>
	</itemizedlist>
	</para>

	<para>The process described above is pretty complex and there are a lot of ways to make it entirely automatic
	and transparent to the C and the Python programmers:
	<itemizedlist>
	<listitem><para>The first solution is to write by hand a lot of glue code, once for each function exported or imported,
	which does the python to C parameter conversion and the C to python return value conversion. This glue code is then
	linked with the interpreter which allows python programs to call a python functions which delegates the work to the
	C function.</para></listitem>
	<listitem><para>Another nicer solution is to automatically generate the glue code, once for each function exported or
	imported, with a special compiler which
	reads the original function signature.</para></listitem>
	<listitem><para>The solution used by GLib is to use the GType library which holds at runtime a description of
	all the objects manipulated by the programmer. This so-called <emphasis>dynamic type</emphasis><footnote>
	<para>
	There are numerous different implementations of dynamic type systems: all C++
	compilers have one, Java and .NET have one too. A dynamic type system allows you
	to get information about every instantiated object at runtime. It can be implemented
	by a process-specific database: every new object created registers the characteristics
	of its associated type in the type system. It can also be implemented by introspection
	interfaces. The common point between all these different type systems and implementations
	is that they all allow you to query for object metadata at runtime.
	</para>
	</footnote>

	library is then
	used by special generic glue code to automatically convert function parameters and function caling conventions
	between different runtime domains.</para></listitem>
	</itemizedlist>
	The greatest advantage of the solution implemented by GType is that the glue code sitting at the runtime domain
	boundaries is written once: the figure below states this more clearly.
	<figure>
	<mediaobject>
	<imageobject> <!-- this is for HTML output -->
	<imagedata fileref="glue.png" format="png" align="center"/>
	</imageobject>
	<imageobject> <!-- this is for PDF output -->
	<imagedata fileref="glue.jpg" format="jpg" align="center"/>
	</imageobject>
	</mediaobject>
	</figure>

	Currently, there exist at least Python and Perl glue code which makes it possible to use
	C objects written with GType directly in Python or Perl, without any further work.
	</para>


	</sect1>


	</chapter>