| <html> |
| <title> |
| Using SWIG to Control, Prototype, and Debug C Programs with Python |
| </title> |
| <body BGCOLOR="ffffff"> |
| <h1> Using SWIG to Control, Prototype, and Debug C Programs with Python </h1> |
| <b> |
| (Submitted to the 4th International Python Conference) <br> <br> |
| David M. Beazley <br> |
| Department of Computer Science <br> |
| University of Utah <br> |
| Salt Lake City, Utah 84112 <br> |
| <a href="http://www.cs.utah.edu/~beazley"> beazley@cs.utah.edu </a> <br> |
| </b> |
| <h2> Abstract </h2> |
| <em> |
| I discuss early results in using SWIG to interact with C and C++ programs |
| using Python. Examples of using SWIG are provided along with a |
| discussion of the benefits and limitations of this approach. This |
| paper describes work in progress with the hope of getting feedback and |
| ideas from the Python community. |
| </em> <br> <br> |
| |
| Note : SWIG is freely available at <a href="http://www.cs.utah.edu/~beazley/SWIG"> http://www.cs.utah.edu/~beazley/SWIG </a> |
| |
| <h2> Introduction </h2> |
| |
| SWIG (Simplified Wrapper and Interface Generator) is a code |
| development |
| tool designed to make it easy for scientists and engineers to add scripting |
| language interfaces to programs and libraries written in C and C++. |
| The first version of SWIG was developed in July 1995 |
| for use with |
| very large scale scientific applications running on the Connection |
| Machine 5 and Cray T3D at Los Alamos National Laboratory. |
| Since that time, the system has been extended to support several |
| interface languages including Python, Tcl, Perl4, Perl5, and Guile [1]. |
| |
| In this paper, I will describe how SWIG can be used to interact with |
| C and C++ code from Python. I'll also discuss some |
| applications and limitations of this approach. It is important to |
| emphasize that SWIG was primarily designed for scientists and engineers |
| who would like to have a nice interface, but who would rather |
| work on more interesting problems than figuring out how to build |
| a user interface or using a complicated interface generation tool. |
| |
| <h2> Introducing SWIG </h2> |
| |
| The idea behind SWIG is really quite simple---most interface |
| languages such as Python provide some mechanism for making |
| calls to functions written in C or C++. However, this almost |
| always requires the user to write special "wrapper" functions |
| that provide the glue between the scripting language and the |
| C functions. As an example, if you wanted to add the |
| <tt> getenv() </tt> function to Python, you would need to write |
| the following wrapper code [4] : |
| |
| <blockquote> |
| <tt> |
| <pre> |
| static PyObject *wrap_getenv(PyObject *self, PyObject *args) { |
| char * result; |
| char * arg0; |
| if(!PyArg_ParseTuple(args, "s",&arg0)) |
| return NULL; |
| result = getenv(arg0); |
| return Py_BuildValue("s", result); |
| } |
| |
| </pre> </tt> |
| </blockquote> |
| While writing a single wrapper function isn't too difficult, |
| it quickly becomes tedious and error prone as the number |
| of functions increases. |
| SWIG automates this process by generating |
| wrapper code from a list ANSI C function and variable declarations. |
| As a result, adding scripting languages to C applications can become |
| an almost trivial exercise. <br> <br> |
| |
| The core of the SWIG consists of a YACC parser and a collection |
| of general purpose utility functions. The output of the parser |
| is sent to two different modules--a language module for writing |
| wrapper functions and a documentation module for producing a simple |
| reference guide. Each of the target languages and documentation methods |
| is implemented as C++ class that can be plugged in to the system. |
| Currently, Python, Tcl, Perl4, Perl5, and Guile are supported as |
| target languages while documentation can be produced in ASCII, HTML, |
| and LaTeX. Different target languages can be implemented as a new |
| C++ class that is linked with a general purpose SWIG library file. |
| |
| <h2> Previous work </h2> |
| |
| Most Python users are probably aware that automatic wrapper generation is |
| not a new idea. In particular, packages such as ILU are capable |
| of providing bindings between C, C++, and Python using specifications |
| given in IDL (Interface Definition Language) [3]. SWIG is not really meant to compete with this |
| approach, but is designed to be a no-nonsense, easy to use tool |
| that scientists and engineers can use to build interesting interfaces |
| without having to worry about grungy programming details or learning |
| a complicated interface specification language. |
| With that said, SWIG is certainly not designed to turn Python into |
| a C or C++ clone (I'm not sure that turning Python into interpreted |
| C/C++ compiler would be a good idea anyways). However, SWIG can allow |
| Python to interface with a surprisingly wide variety of C functions. |
| |
| </body> |
| </html> |
| |
| <h2> A SWIG example </h2> |
| |
| While it is not my intent to provide a tutorial, I hope to |
| illustrate how SWIG works by building a module from part of the |
| UNIX socket library. |
| While there is probably no need to do this in practice (since Python |
| already has a socket module), it illustrates most of SWIG's capabilities |
| on a moderately complex example and provides a basis for |
| comparison between an existing module and one created by SWIG. <br> <br> |
| |
| As input, SWIG takes an input file referred to as an "interface file." |
| The file starts with a preamble containing the name of the |
| module and a code block where special header files, additional C |
| code, and other things can be added. After that, ANSI C function |
| and variable declarations are listed in any order. Since SWIG |
| uses C syntax, it's usually fairly easy to build an interface |
| from scratch or simply by copying an existing header file. |
| The following SWIG interface file will be used for our socket example : |
| |
| <blockquote> |
| <tt> |
| <pre> |
| // socket.i |
| // SWIG Interface file to play with some sockets |
| %init sock // Name of our module |
| %{ |
| #include <sys/types.h> |
| #include <sys/socket.h> |
| #include <netinet/in.h> |
| #include <arpa/inet.h> |
| #include <netdb.h> |
| |
| /* Set some values in the sockaddr_in structure */ |
| struct sockaddr *new_sockaddr_in(short family, unsigned long hostid, int port) { |
| struct sockaddr_in *addr; |
| addr = (struct sockaddr_in *) malloc(sizeof(struct sockaddr_in)); |
| bzero((char *) addr, sizeof(struct sockaddr_in)); |
| addr->sin_family = family; |
| addr->sin_addr.s_addr = hostid; |
| addr->sin_port = htons(port); |
| return (struct sockaddr *) addr; |
| } |
| /* Get host address, but return as a string instead of hostent */ |
| char *my_gethostbyname(char *hostname) { |
| struct hostent *h; |
| h = gethostbyname(hostname); |
| if (h) return h->h_name; |
| else return ""; |
| } |
| %} |
| |
| // Add these constants |
| enum {AF_UNIX, AF_INET, SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, |
| IPPROTO_UDP, IPPROTO_TCP, INADDR_ANY}; |
| |
| #define SIZEOF_SOCKADDR sizeof(struct sockaddr) |
| |
| // Wrap these functions |
| int socket(int family, int type, int protocol); |
| int bind(int sockfd, struct sockaddr *myaddr, int addrlen); |
| int connect(int sockfd, struct sockaddr *servaddr, int addrlen); |
| int listen(int sockfd, int backlog); |
| int accept(int sockfd, struct sockaddr *peer, %val int *addrlen); |
| int close(int fd); |
| struct sockaddr *new_sockaddr_in(short family, unsigned long, int port); |
| %name gethostbyname { char *my_gethostbyname(char *); } |
| unsigned long inet_addr(const char *ip); |
| |
| %include unixio.i |
| </pre> |
| </tt> |
| </blockquote> |
| |
| There are several things to notice about this file |
| |
| <ul> |
| <li> The module name is specified with the <tt> %init </tt> directive. |
| <li> Header files and supporting C code is placed in <tt> %{,%} </tt>. |
| <li> Special support functions may be necessary. For example, <tt> |
| new_sockaddr_in() </tt> creates a new <tt> struct sockaddr_in </tt> type |
| and sets some of its values. |
| <li> Constants are created with <tt> enum </tt> or <tt> #define. </tt> |
| <li> Functions can take most C basic datatypes as well as pointers |
| to complex types. |
| <li> Functions can be renamed with the <tt> %name </tt> directive. |
| <li> The <tt> %val </tt> directive forces a function argument to be called by |
| value. |
| |
| <li> The <tt> %include </tt> directive can be used to include other |
| SWIG interface files. In this case, the file <tt> unixio.i </tt> |
| might look like the following : |
| |
| <blockquote> |
| <tt> |
| <pre> |
| // File : unixio.i |
| // Some file I/O and memory functions |
| %{ |
| %} |
| int read(int fd, void *buffer, int n); |
| int write(int fd, void *buffer, int n); |
| typedef unsigned int size_t; |
| void *malloc(size_t nbytes); |
| </pre> |
| </tt> |
| </blockquote> |
| <li> Since interface files can include other interface files, it can be very |
| easy to build up libraries and collections of useful functions. |
| <li> <tt> typedef </tt> can be used to provide mapping between datatypes. |
| <li> C and C++ comments are allowed and like C, SWIG ignores whitespace. |
| </ul> |
| |
| The key thing to notice about SWIG interface files is that they support |
| real C functions, constants, and datatypes including C pointers. As |
| a general rule these files are also independent of the target scripting |
| language--the primary reason why it's easy to support different |
| languages. |
| |
| <h2> Building a Python module </h2> |
| |
| Building a Python module with SWIG is a simple process that proceeds as |
| follows (assuming you're running Solaris 2.x) : |
| |
| <blockquote> |
| <tt> |
| <pre> |
| unix > wrap -python socket.i |
| unix > gcc -c socket_wrap.c -I/usr/local/include/Py |
| unix > ld -G socket_wrap.o -lsocket -lnsl -o sockmodule.so |
| </pre> |
| </tt> |
| </blockquote> |
| |
| The <tt> wrap </tt> command takes the interface file and produces a C file |
| called <tt> socket_wrap.c </tt>. This file is then compiled and built into |
| a shared object file. |
| |
| <h2> A sample Python script </h2> |
| |
| Our new socket module can be used normally within a Python script. |
| For example, we could write a simple server process to echo all |
| data received back to a client : |
| |
| <blockquote> |
| <tt> |
| <pre> |
| # Echo server program using SWIG module |
| from sock import * |
| PORT = 5000 |
| sockfd = socket(AF_INET, SOCK_STREAM, 0) |
| if sockfd < 0: |
| raise IOError, 'server : unable to open stream socket' |
| addr = new_sockaddr_in(AF_INET,INADDR_ANY,PORT) |
| if (bind(sockfd, addr, SIZEOF_SOCKADDR)) < 0 : |
| raise IOError, 'server : unable to bind local address' |
| listen(sockfd,5) |
| client_addr = new_sockaddr_in(AF_INET, 0, 0) |
| newsockfd = accept(sockfd, client_addr, SIZEOF_SOCKADDR) |
| buffer = malloc(1024) |
| while 1: |
| nbytes = read(newsockfd, buffer, 1024) |
| if nbytes > 0 : |
| write(newsockfd,buffer,nbytes) |
| else : break |
| close(newsockfd) |
| close(sockfd) |
| </pre> |
| </tt> |
| </blockquote> |
| |
| In our Python module, we are manipulating a socket, in almost |
| exactly the same way as would be done in a C program. We'll take |
| a look at a client written using the Python socket library shortly. |
| |
| <h2> Pointers and run-time type checking </h2> |
| |
| SWIG imports C pointers as ASCII strings containing both the |
| address and pointer type. Thus, a typical SWIG pointer might look |
| like the following : <br> <br> |
| |
| <center> |
| <tt> _fd2a0_struct_sockaddr_p </tt> <br> <br> |
| </center> |
| |
| Some may view the idea of importing C pointers into Python as unsafe or even |
| truly insane. However, handling pointers seems to be needed in order |
| for SWIG to handle most interesting types of C code. To provide some safety, |
| the wrapper functions generated by SWIG perform pointer-type checking |
| (since importing C pointers into |
| Python has in effect, bypassed type-checking in the C compiler). |
| When incompatible pointer types are used, this is what happens : |
| |
| <blockquote> |
| <tt> |
| <pre> |
| unix > python |
| Python 1.3 (Apr 12 1996) [GCC 2.5.8] |
| Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam |
| >>> from sock import * |
| >>> sockfd = socket(AF_INET, SOCK_STREAM,0); |
| >>> buffer = malloc(8192); |
| >>> bind(sockfd,buffer,SIZEOF_SOCKADDR); |
| Traceback (innermost last): |
| File "<stdin>", line 1, in ? |
| TypeError: Type error in argument 2 of bind. Expected _struct_sockaddr_p. |
| >>> |
| </pre> |
| </tt> |
| </blockquote> |
| |
| |
| SWIG's run-time type checking provides a certain degree of safety from |
| using invalid parameters and making simple mistakes. Of course, any system |
| supporting pointers can be abused, but SWIG's implementation has proven |
| to be quite reliable under normal use. |
| |
| <h2> Comparison with a real Python module </h2> |
| |
| As a point of comparison, we can now write a client script using the |
| Python socket module (this example courtesy of the Python library |
| reference manual) [5]. |
| |
| <blockquote> |
| <tt> |
| <pre> |
| # Echo client program |
| from socket import * |
| HOST = 'tjaze.lanl.gov' |
| PORT = 5000 |
| s = socket(AF_INET, SOCK_STREAM) |
| s.connect(HOST,PORT) |
| s.send('Hello world') |
| data = s.recv(1024) |
| s.close() |
| print 'Received', `data` |
| </pre> |
| </tt> |
| </blockquote> |
| |
| As one might expect, the two scripts look fairly similar. However |
| there are certain obvious differences. First, the SWIG generated |
| module is clearly a quick and dirty approach. We must explicitly |
| check for errors and create a few C data structures. Furthermore, |
| while Python functions such as <tt> socket() </tt> can take optional |
| arguments, SWIG generated functions can not (since the underlying |
| C function can't). Of course, the apparent ugliness |
| of the SWIG module compared to the Python equivalent is not the |
| point here. |
| The real idea to emphasize is that SWIG |
| can take a set of real C functions with moderate complexity and |
| produce a fully functional Python module in a very short period of |
| time. In this case, about 10 minutes worth of effort. |
| (I'm |
| guessing that the Python socket module required more time than that). |
| With a bit of work, the SWIG generated module could probably be |
| used to build a more user-friendly version that hid many of the details. |
| |
| <h2> Controlling C programs </h2> |
| |
| Of course, the real goal of SWIG is not to produce quirky |
| replacements for modules in the Python library. Instead, it |
| better suited as a mechanism for |
| controlling a variety of C programs. <br> <br> |
| |
| In the SPaSM molecular dynamics code used at Los Alamos National |
| Laboratory, SWIG |
| is used to build an interface out of about 200 C functions [2]. |
| This happens at compile time so most users don't |
| notice that it has occurred. However, as a result, it is now |
| extremely easy for the physicists using the code to |
| extend it with new functions. |
| Whenever new functionality is added, the new |
| functions are put into a SWIG interface file and the functions |
| become available when the code is recompiled (a process |
| which usually only involves the new C functions and SWIG generated |
| wrapper file). Of course, when running under Python, it is possible |
| to build extensions and dynamically load them as needed. <br> <br> |
| |
| Another point, not to be overlooked, is the fact that SWIG makes it |
| very easy for a user to combine bits and pieces of completely different |
| software packages without waiting for someone to write a special |
| purpose module. For example, SWIG can be used to import the C API |
| of Matlab 4.2 into Python. This combined with a simulation module |
| can allow a scientist to perform both simulation and data analysis in |
| interactively or from a script. One could even build a Tkinter interface to control |
| the entire system. The ease of using SWIG makes it possible for |
| scientists to build applications out of components that |
| might not have been considered earlier. |
| |
| <h2> Prototyping and Debugging </h2> |
| |
| When debugging new modules or libraries, it would be nice to be able |
| to test out the functions without having to repeatedly recompile the |
| module with the rest of a large application. With SWIG, it is often |
| possible to prototype and debug modules independently. SWIG can be |
| used to build Python wrappers around the functions and scripts |
| can be used for testing. In many cases, it is possible to create |
| the C data structures and other information that would have been generated |
| by the larger application all from within a Python script. Since |
| SWIG requires no changes to the C code, integrating the new C code into |
| the large package is usually pretty easy. <br> <br> |
| |
| A related benefit of the SWIG approach is that it is now possible |
| to easily experiment with large software libraries and packages. |
| Peter-Pike Sloan at the |
| University of Utah, used SWIG to wrap the entire contents of the Open-GL |
| library (including more than 560 constants and 340 C functions). |
| This ``interpreted'' Open-GL could then be used interactively to |
| determine various image parameters and to draw simple figures. |
| Is this case, it proved to be a remarkably effective way of |
| figuring out the effects of different parameters and functions without |
| having to recompile after every modification. |
| |
| <h2> Living dangerously with datatypes </h2> |
| |
| One of the reasons SWIG is easy to use for prototyping and control |
| applications is its extremely forgiving treatment of datatypes. |
| <tt> typedef </tt> can be used to map datatypes, but whenever |
| SWIG encounters an unknown datatype, it simply assumes that it's |
| some sort of complex datatype. As a result, it's possible to |
| wrap some fairly complex C code with almost no effort. For example, |
| one could create a module allowing Python to create a Tcl interpreter |
| and execute Tcl commands as follows : |
| |
| <blockquote> |
| <tt> |
| <pre> |
| // tcl.i |
| // Wrap Tcl's C interface |
| %init tcl |
| %{ |
| #include <tcl.h> |
| %} |
| |
| Tcl_Interp *Tcl_CreateInterp(void); |
| void Tcl_DeleteInterp(Tcl_Interp *interp); |
| int Tcl_Eval(Tcl_Interp *interp, char *script); |
| int Tcl_EvalFile(Tcl_Interp *interp, char *file); |
| </pre> |
| </tt> |
| </blockquote> |
| |
| SWIG has no idea what <tt> Tcl_Interp </tt> is, but it doesn't |
| really care as long as it's used as a pointer. When used in a |
| Python script, this module would work as follows : |
| <blockquote> |
| <tt> |
| <pre> |
| unix > python |
| Python 1.3 (Mar 26 1996) [GCC 2.7.0] |
| Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam |
| >>> import tcl |
| >>> interp = tcl.Tcl_CreateInterp(); |
| >>> tcl.Tcl_Eval(interp,"for {set i 0} {$i < 5} {incr i 1} {puts $i}"); |
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 0 |
| >>> |
| </pre> |
| </tt> |
| </blockquote> |
| |
| <h2> Limitations </h2> |
| |
| This approach represents a balance of flexibility and ease of use. |
| I have always felt that the tool shouldn't be more complicated than |
| the original problem. Therefore, SWIG certainly won't do everything. |
| Some of SWIG's limitations include it's limited understanding of |
| complex datatypes. Often the user must write special functions |
| to extract information from structures even though passing complex |
| datatypes around by reference works remarkably well in practice. |
| SWIG's support for C++ is also quite limited. At the moment SWIG |
| can be used to pass pointers to C++ objects around, but actually |
| transforming a C++ object into some sort of Python object is far |
| beyond the capabilities of the current implementation. Finally, SWIG |
| lacks an exception model which may be of concern to some users. |
| |
| <h2> Limitations in the Python implementation </h2> |
| |
| SWIG's support for Python is in many ways, limited by the fact that |
| SWIG was originally developed for use with other languages. For example, |
| representing C pointers as a character string can be seen as a carry-over |
| from an earlier Tcl implementation. With a little work, I believe |
| that one could create a Python "pointer" datatype while still retaining |
| type-checking and other features. <br> <br> |
| |
| Another limitation in the current Python implementation is the apparent lack |
| of variable linking support. Many applications, particularly scientific ones, |
| have global variables that a user may want to access. SWIG currently |
| creates Python functions for accessing these variables, but other |
| scripting languages provide a more natural mechanism. |
| For example, in Perl, it is |
| possible to attach "set" and "get" functions to a Perl object. These |
| are evaluated whenever that variable is accessed in a script. |
| For all practical purposes this special variable looks like any other Perl |
| variable, but is really linked to an C global variable (of course, |
| maybe there is a reason why Perl calls these "magic" variables). <br> <br> |
| |
| If an equivalent variable linking mechanism is available in Python, I |
| couldn't find it [4]. |
| If not, it might be possible to create a new kind of Python datatype |
| that supports variable linking, but which interacts seamlessly |
| with existing Python integer, floating point, and character string types. |
| |
| <h2> Summary and future work </h2> |
| |
| This paper has described work in progress with the SWIG system and |
| its support for Python. I believe that this approach shows great |
| promise for future work--especially within the scientific and |
| engineering community. Most of the people introduced to SWIG and |
| Python have |
| found both systems extremely easy and straightforward to use. |
| I am currently |
| quite interested in improving SWIG's interface to Python and exploring |
| its use with numerical Python in order to build interesting |
| and flexible scientific applications [6]. |
| |
| <h2> Acknowledgments </h2> |
| |
| This project would not have been possible without the support of a number |
| of people. Peter Lomdahl, Shujia Zhou, Brad Holian, Tim Germann, and Niels |
| Jensen at Los Alamos National Laboratory were the first users and were |
| instrumental in testing out the first designs. Patrick Tullmann, |
| John Schmidt,Peter-Pike Sloan, and Kurtis Bleeker at the University of Utah have also |
| been instrumental in testing out some of the newer versions and |
| providing feedback. |
| John Buckman has suggested many interesting |
| improvements and is currently working on porting SWIG to non-Unix |
| platforms including NT, OS/2 and Macintosh. |
| Finally |
| I'd like to thank Chris Johnson and the members of the Scientific |
| Computing and Imaging group at the University of Utah for their continued |
| support. Some of this work was performed under the auspices of the |
| Department of Energy. |
| |
| <h2> References </h2> |
| |
| [1] D.M. Beazley. <em> SWIG : An Easy to Use Tool for Integrating |
| Scripting Languages with C and C++ </em>. Proceedings of the 4th |
| USENIX Tcl/Tk Workshop (1996) (to appear). <br> <br> |
| |
| [2] D.M. Beazley and P.S. Lomdahl <em> Message-Passing Multi-Cell |
| Molecular Dynamics on the Connection Machine 5 </em>, Parallel |
| Computing 20 (1994), p. 173-195. <br> <br> |
| |
| [3] Bill Janssen and Mike Spreitzer, <em> ILU : Inter-Language |
| Unification via Object Modules</em>, OOPSLA 94 Workshop on |
| Multi-Language Object Models. |
| <br> <br> |
| |
| [4] Guido van Rossum, <em> Extending and Embedding the Python Interpreter, |
| </em> October 1995. <br> <br> |
| |
| [5] Guido van Rossum, <em> Python Library Reference </em>, October |
| 1995. <br> <br> |
| |
| [6] Paul F. Dubois, Konrad Hinsen, and James Hugunin, <em> Numerical |
| Python, </em> Computers in Physics, (1996) (to appear) |
| |
| </body> |
| </html> |