blob: 8bd9e032de488f04f5a01cb8227711d142055412 [file] [log] [blame]
<html>
<title>
Using SWIG to Control, Prototype, and Debug C Programs with Python
</title>
<body BGCOLOR="ffffff">
<h1> Using SWIG to Control, Prototype, and Debug C Programs with Python </h1>
<b>
(Submitted to the 4th International Python Conference) <br> <br>
David M. Beazley <br>
Department of Computer Science <br>
University of Utah <br>
Salt Lake City, Utah 84112 <br>
<a href="http://www.cs.utah.edu/~beazley"> beazley@cs.utah.edu </a> <br>
</b>
<h2> Abstract </h2>
<em>
I discuss early results in using SWIG to interact with C and C++ programs
using Python. Examples of using SWIG are provided along with a
discussion of the benefits and limitations of this approach. This
paper describes work in progress with the hope of getting feedback and
ideas from the Python community.
</em> <br> <br>
Note : SWIG is freely available at <a href="http://www.cs.utah.edu/~beazley/SWIG"> http://www.cs.utah.edu/~beazley/SWIG </a>
<h2> Introduction </h2>
SWIG (Simplified Wrapper and Interface Generator) is a code
development
tool designed to make it easy for scientists and engineers to add scripting
language interfaces to programs and libraries written in C and C++.
The first version of SWIG was developed in July 1995
for use with
very large scale scientific applications running on the Connection
Machine 5 and Cray T3D at Los Alamos National Laboratory.
Since that time, the system has been extended to support several
interface languages including Python, Tcl, Perl4, Perl5, and Guile [1].
In this paper, I will describe how SWIG can be used to interact with
C and C++ code from Python. I'll also discuss some
applications and limitations of this approach. It is important to
emphasize that SWIG was primarily designed for scientists and engineers
who would like to have a nice interface, but who would rather
work on more interesting problems than figuring out how to build
a user interface or using a complicated interface generation tool.
<h2> Introducing SWIG </h2>
The idea behind SWIG is really quite simple---most interface
languages such as Python provide some mechanism for making
calls to functions written in C or C++. However, this almost
always requires the user to write special "wrapper" functions
that provide the glue between the scripting language and the
C functions. As an example, if you wanted to add the
<tt> getenv() </tt> function to Python, you would need to write
the following wrapper code [4] :
<blockquote>
<tt>
<pre>
static PyObject *wrap_getenv(PyObject *self, PyObject *args) {
char * result;
char * arg0;
if(!PyArg_ParseTuple(args, "s",&arg0))
return NULL;
result = getenv(arg0);
return Py_BuildValue("s", result);
}
</pre> </tt>
</blockquote>
While writing a single wrapper function isn't too difficult,
it quickly becomes tedious and error prone as the number
of functions increases.
SWIG automates this process by generating
wrapper code from a list ANSI C function and variable declarations.
As a result, adding scripting languages to C applications can become
an almost trivial exercise. <br> <br>
The core of the SWIG consists of a YACC parser and a collection
of general purpose utility functions. The output of the parser
is sent to two different modules--a language module for writing
wrapper functions and a documentation module for producing a simple
reference guide. Each of the target languages and documentation methods
is implemented as C++ class that can be plugged in to the system.
Currently, Python, Tcl, Perl4, Perl5, and Guile are supported as
target languages while documentation can be produced in ASCII, HTML,
and LaTeX. Different target languages can be implemented as a new
C++ class that is linked with a general purpose SWIG library file.
<h2> Previous work </h2>
Most Python users are probably aware that automatic wrapper generation is
not a new idea. In particular, packages such as ILU are capable
of providing bindings between C, C++, and Python using specifications
given in IDL (Interface Definition Language) [3]. SWIG is not really meant to compete with this
approach, but is designed to be a no-nonsense, easy to use tool
that scientists and engineers can use to build interesting interfaces
without having to worry about grungy programming details or learning
a complicated interface specification language.
With that said, SWIG is certainly not designed to turn Python into
a C or C++ clone (I'm not sure that turning Python into interpreted
C/C++ compiler would be a good idea anyways). However, SWIG can allow
Python to interface with a surprisingly wide variety of C functions.
</body>
</html>
<h2> A SWIG example </h2>
While it is not my intent to provide a tutorial, I hope to
illustrate how SWIG works by building a module from part of the
UNIX socket library.
While there is probably no need to do this in practice (since Python
already has a socket module), it illustrates most of SWIG's capabilities
on a moderately complex example and provides a basis for
comparison between an existing module and one created by SWIG. <br> <br>
As input, SWIG takes an input file referred to as an "interface file."
The file starts with a preamble containing the name of the
module and a code block where special header files, additional C
code, and other things can be added. After that, ANSI C function
and variable declarations are listed in any order. Since SWIG
uses C syntax, it's usually fairly easy to build an interface
from scratch or simply by copying an existing header file.
The following SWIG interface file will be used for our socket example :
<blockquote>
<tt>
<pre>
// socket.i
// SWIG Interface file to play with some sockets
%init sock // Name of our module
%{
#include &lt;sys/types.h&gt
#include &lt;sys/socket.h&gt
#include &lt;netinet/in.h&gt
#include &lt;arpa/inet.h&gt
#include &lt;netdb.h&gt
/* Set some values in the sockaddr_in structure */
struct sockaddr *new_sockaddr_in(short family, unsigned long hostid, int port) {
struct sockaddr_in *addr;
addr = (struct sockaddr_in *) malloc(sizeof(struct sockaddr_in));
bzero((char *) addr, sizeof(struct sockaddr_in));
addr-&gt;sin_family = family;
addr-&gt;sin_addr.s_addr = hostid;
addr-&gt;sin_port = htons(port);
return (struct sockaddr *) addr;
}
/* Get host address, but return as a string instead of hostent */
char *my_gethostbyname(char *hostname) {
struct hostent *h;
h = gethostbyname(hostname);
if (h) return h-&gt;h_name;
else return "";
}
%}
// Add these constants
enum {AF_UNIX, AF_INET, SOCK_STREAM, SOCK_DGRAM, SOCK_RAW,
IPPROTO_UDP, IPPROTO_TCP, INADDR_ANY};
#define SIZEOF_SOCKADDR sizeof(struct sockaddr)
// Wrap these functions
int socket(int family, int type, int protocol);
int bind(int sockfd, struct sockaddr *myaddr, int addrlen);
int connect(int sockfd, struct sockaddr *servaddr, int addrlen);
int listen(int sockfd, int backlog);
int accept(int sockfd, struct sockaddr *peer, %val int *addrlen);
int close(int fd);
struct sockaddr *new_sockaddr_in(short family, unsigned long, int port);
%name gethostbyname { char *my_gethostbyname(char *); }
unsigned long inet_addr(const char *ip);
%include unixio.i
</pre>
</tt>
</blockquote>
There are several things to notice about this file
<ul>
<li> The module name is specified with the <tt> %init </tt> directive.
<li> Header files and supporting C code is placed in <tt> %{,%} </tt>.
<li> Special support functions may be necessary. For example, <tt>
new_sockaddr_in() </tt> creates a new <tt> struct sockaddr_in </tt> type
and sets some of its values.
<li> Constants are created with <tt> enum </tt> or <tt> #define. </tt>
<li> Functions can take most C basic datatypes as well as pointers
to complex types.
<li> Functions can be renamed with the <tt> %name </tt> directive.
<li> The <tt> %val </tt> directive forces a function argument to be called by
value.
<li> The <tt> %include </tt> directive can be used to include other
SWIG interface files. In this case, the file <tt> unixio.i </tt>
might look like the following :
<blockquote>
<tt>
<pre>
// File : unixio.i
// Some file I/O and memory functions
%{
%}
int read(int fd, void *buffer, int n);
int write(int fd, void *buffer, int n);
typedef unsigned int size_t;
void *malloc(size_t nbytes);
</pre>
</tt>
</blockquote>
<li> Since interface files can include other interface files, it can be very
easy to build up libraries and collections of useful functions.
<li> <tt> typedef </tt> can be used to provide mapping between datatypes.
<li> C and C++ comments are allowed and like C, SWIG ignores whitespace.
</ul>
The key thing to notice about SWIG interface files is that they support
real C functions, constants, and datatypes including C pointers. As
a general rule these files are also independent of the target scripting
language--the primary reason why it's easy to support different
languages.
<h2> Building a Python module </h2>
Building a Python module with SWIG is a simple process that proceeds as
follows (assuming you're running Solaris 2.x) :
<blockquote>
<tt>
<pre>
unix > wrap -python socket.i
unix > gcc -c socket_wrap.c -I/usr/local/include/Py
unix > ld -G socket_wrap.o -lsocket -lnsl -o sockmodule.so
</pre>
</tt>
</blockquote>
The <tt> wrap </tt> command takes the interface file and produces a C file
called <tt> socket_wrap.c </tt>. This file is then compiled and built into
a shared object file.
<h2> A sample Python script </h2>
Our new socket module can be used normally within a Python script.
For example, we could write a simple server process to echo all
data received back to a client :
<blockquote>
<tt>
<pre>
# Echo server program using SWIG module
from sock import *
PORT = 5000
sockfd = socket(AF_INET, SOCK_STREAM, 0)
if sockfd < 0:
raise IOError, 'server : unable to open stream socket'
addr = new_sockaddr_in(AF_INET,INADDR_ANY,PORT)
if (bind(sockfd, addr, SIZEOF_SOCKADDR)) < 0 :
raise IOError, 'server : unable to bind local address'
listen(sockfd,5)
client_addr = new_sockaddr_in(AF_INET, 0, 0)
newsockfd = accept(sockfd, client_addr, SIZEOF_SOCKADDR)
buffer = malloc(1024)
while 1:
nbytes = read(newsockfd, buffer, 1024)
if nbytes > 0 :
write(newsockfd,buffer,nbytes)
else : break
close(newsockfd)
close(sockfd)
</pre>
</tt>
</blockquote>
In our Python module, we are manipulating a socket, in almost
exactly the same way as would be done in a C program. We'll take
a look at a client written using the Python socket library shortly.
<h2> Pointers and run-time type checking </h2>
SWIG imports C pointers as ASCII strings containing both the
address and pointer type. Thus, a typical SWIG pointer might look
like the following : <br> <br>
<center>
<tt> _fd2a0_struct_sockaddr_p </tt> <br> <br>
</center>
Some may view the idea of importing C pointers into Python as unsafe or even
truly insane. However, handling pointers seems to be needed in order
for SWIG to handle most interesting types of C code. To provide some safety,
the wrapper functions generated by SWIG perform pointer-type checking
(since importing C pointers into
Python has in effect, bypassed type-checking in the C compiler).
When incompatible pointer types are used, this is what happens :
<blockquote>
<tt>
<pre>
unix > python
Python 1.3 (Apr 12 1996) [GCC 2.5.8]
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> from sock import *
>>> sockfd = socket(AF_INET, SOCK_STREAM,0);
>>> buffer = malloc(8192);
>>> bind(sockfd,buffer,SIZEOF_SOCKADDR);
Traceback (innermost last):
File "<stdin>", line 1, in ?
TypeError: Type error in argument 2 of bind. Expected _struct_sockaddr_p.
>>>
</pre>
</tt>
</blockquote>
SWIG's run-time type checking provides a certain degree of safety from
using invalid parameters and making simple mistakes. Of course, any system
supporting pointers can be abused, but SWIG's implementation has proven
to be quite reliable under normal use.
<h2> Comparison with a real Python module </h2>
As a point of comparison, we can now write a client script using the
Python socket module (this example courtesy of the Python library
reference manual) [5].
<blockquote>
<tt>
<pre>
# Echo client program
from socket import *
HOST = 'tjaze.lanl.gov'
PORT = 5000
s = socket(AF_INET, SOCK_STREAM)
s.connect(HOST,PORT)
s.send('Hello world')
data = s.recv(1024)
s.close()
print 'Received', `data`
</pre>
</tt>
</blockquote>
As one might expect, the two scripts look fairly similar. However
there are certain obvious differences. First, the SWIG generated
module is clearly a quick and dirty approach. We must explicitly
check for errors and create a few C data structures. Furthermore,
while Python functions such as <tt> socket() </tt> can take optional
arguments, SWIG generated functions can not (since the underlying
C function can't). Of course, the apparent ugliness
of the SWIG module compared to the Python equivalent is not the
point here.
The real idea to emphasize is that SWIG
can take a set of real C functions with moderate complexity and
produce a fully functional Python module in a very short period of
time. In this case, about 10 minutes worth of effort.
(I'm
guessing that the Python socket module required more time than that).
With a bit of work, the SWIG generated module could probably be
used to build a more user-friendly version that hid many of the details.
<h2> Controlling C programs </h2>
Of course, the real goal of SWIG is not to produce quirky
replacements for modules in the Python library. Instead, it
better suited as a mechanism for
controlling a variety of C programs. <br> <br>
In the SPaSM molecular dynamics code used at Los Alamos National
Laboratory, SWIG
is used to build an interface out of about 200 C functions [2].
This happens at compile time so most users don't
notice that it has occurred. However, as a result, it is now
extremely easy for the physicists using the code to
extend it with new functions.
Whenever new functionality is added, the new
functions are put into a SWIG interface file and the functions
become available when the code is recompiled (a process
which usually only involves the new C functions and SWIG generated
wrapper file). Of course, when running under Python, it is possible
to build extensions and dynamically load them as needed. <br> <br>
Another point, not to be overlooked, is the fact that SWIG makes it
very easy for a user to combine bits and pieces of completely different
software packages without waiting for someone to write a special
purpose module. For example, SWIG can be used to import the C API
of Matlab 4.2 into Python. This combined with a simulation module
can allow a scientist to perform both simulation and data analysis in
interactively or from a script. One could even build a Tkinter interface to control
the entire system. The ease of using SWIG makes it possible for
scientists to build applications out of components that
might not have been considered earlier.
<h2> Prototyping and Debugging </h2>
When debugging new modules or libraries, it would be nice to be able
to test out the functions without having to repeatedly recompile the
module with the rest of a large application. With SWIG, it is often
possible to prototype and debug modules independently. SWIG can be
used to build Python wrappers around the functions and scripts
can be used for testing. In many cases, it is possible to create
the C data structures and other information that would have been generated
by the larger application all from within a Python script. Since
SWIG requires no changes to the C code, integrating the new C code into
the large package is usually pretty easy. <br> <br>
A related benefit of the SWIG approach is that it is now possible
to easily experiment with large software libraries and packages.
Peter-Pike Sloan at the
University of Utah, used SWIG to wrap the entire contents of the Open-GL
library (including more than 560 constants and 340 C functions).
This ``interpreted'' Open-GL could then be used interactively to
determine various image parameters and to draw simple figures.
Is this case, it proved to be a remarkably effective way of
figuring out the effects of different parameters and functions without
having to recompile after every modification.
<h2> Living dangerously with datatypes </h2>
One of the reasons SWIG is easy to use for prototyping and control
applications is its extremely forgiving treatment of datatypes.
<tt> typedef </tt> can be used to map datatypes, but whenever
SWIG encounters an unknown datatype, it simply assumes that it's
some sort of complex datatype. As a result, it's possible to
wrap some fairly complex C code with almost no effort. For example,
one could create a module allowing Python to create a Tcl interpreter
and execute Tcl commands as follows :
<blockquote>
<tt>
<pre>
// tcl.i
// Wrap Tcl's C interface
%init tcl
%{
#include &lt;tcl.h&gt;
%}
Tcl_Interp *Tcl_CreateInterp(void);
void Tcl_DeleteInterp(Tcl_Interp *interp);
int Tcl_Eval(Tcl_Interp *interp, char *script);
int Tcl_EvalFile(Tcl_Interp *interp, char *file);
</pre>
</tt>
</blockquote>
SWIG has no idea what <tt> Tcl_Interp </tt> is, but it doesn't
really care as long as it's used as a pointer. When used in a
Python script, this module would work as follows :
<blockquote>
<tt>
<pre>
unix > python
Python 1.3 (Mar 26 1996) [GCC 2.7.0]
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import tcl
>>> interp = tcl.Tcl_CreateInterp();
>>> tcl.Tcl_Eval(interp,"for {set i 0} {$i < 5} {incr i 1} {puts $i}");
0
1
2
3
4
0
>>>
</pre>
</tt>
</blockquote>
<h2> Limitations </h2>
This approach represents a balance of flexibility and ease of use.
I have always felt that the tool shouldn't be more complicated than
the original problem. Therefore, SWIG certainly won't do everything.
Some of SWIG's limitations include it's limited understanding of
complex datatypes. Often the user must write special functions
to extract information from structures even though passing complex
datatypes around by reference works remarkably well in practice.
SWIG's support for C++ is also quite limited. At the moment SWIG
can be used to pass pointers to C++ objects around, but actually
transforming a C++ object into some sort of Python object is far
beyond the capabilities of the current implementation. Finally, SWIG
lacks an exception model which may be of concern to some users.
<h2> Limitations in the Python implementation </h2>
SWIG's support for Python is in many ways, limited by the fact that
SWIG was originally developed for use with other languages. For example,
representing C pointers as a character string can be seen as a carry-over
from an earlier Tcl implementation. With a little work, I believe
that one could create a Python "pointer" datatype while still retaining
type-checking and other features. <br> <br>
Another limitation in the current Python implementation is the apparent lack
of variable linking support. Many applications, particularly scientific ones,
have global variables that a user may want to access. SWIG currently
creates Python functions for accessing these variables, but other
scripting languages provide a more natural mechanism.
For example, in Perl, it is
possible to attach "set" and "get" functions to a Perl object. These
are evaluated whenever that variable is accessed in a script.
For all practical purposes this special variable looks like any other Perl
variable, but is really linked to an C global variable (of course,
maybe there is a reason why Perl calls these "magic" variables). <br> <br>
If an equivalent variable linking mechanism is available in Python, I
couldn't find it [4].
If not, it might be possible to create a new kind of Python datatype
that supports variable linking, but which interacts seamlessly
with existing Python integer, floating point, and character string types.
<h2> Summary and future work </h2>
This paper has described work in progress with the SWIG system and
its support for Python. I believe that this approach shows great
promise for future work--especially within the scientific and
engineering community. Most of the people introduced to SWIG and
Python have
found both systems extremely easy and straightforward to use.
I am currently
quite interested in improving SWIG's interface to Python and exploring
its use with numerical Python in order to build interesting
and flexible scientific applications [6].
<h2> Acknowledgments </h2>
This project would not have been possible without the support of a number
of people. Peter Lomdahl, Shujia Zhou, Brad Holian, Tim Germann, and Niels
Jensen at Los Alamos National Laboratory were the first users and were
instrumental in testing out the first designs. Patrick Tullmann,
John Schmidt,Peter-Pike Sloan, and Kurtis Bleeker at the University of Utah have also
been instrumental in testing out some of the newer versions and
providing feedback.
John Buckman has suggested many interesting
improvements and is currently working on porting SWIG to non-Unix
platforms including NT, OS/2 and Macintosh.
Finally
I'd like to thank Chris Johnson and the members of the Scientific
Computing and Imaging group at the University of Utah for their continued
support. Some of this work was performed under the auspices of the
Department of Energy.
<h2> References </h2>
[1] D.M. Beazley. <em> SWIG : An Easy to Use Tool for Integrating
Scripting Languages with C and C++ </em>. Proceedings of the 4th
USENIX Tcl/Tk Workshop (1996) (to appear). <br> <br>
[2] D.M. Beazley and P.S. Lomdahl <em> Message-Passing Multi-Cell
Molecular Dynamics on the Connection Machine 5 </em>, Parallel
Computing 20 (1994), p. 173-195. <br> <br>
[3] Bill Janssen and Mike Spreitzer, <em> ILU : Inter-Language
Unification via Object Modules</em>, OOPSLA 94 Workshop on
Multi-Language Object Models.
<br> <br>
[4] Guido van Rossum, <em> Extending and Embedding the Python Interpreter,
</em> October 1995. <br> <br>
[5] Guido van Rossum, <em> Python Library Reference </em>, October
1995. <br> <br>
[6] Paul F. Dubois, Konrad Hinsen, and James Hugunin, <em> Numerical
Python, </em> Computers in Physics, (1996) (to appear)
</body>
</html>