SWIG/Tools/WAD/Papers/python.html - third_party/swig - Git at Google

 <html>
 <head>
 <title>WAD: A Module for Converting Fatal Extension Errors into Python Exceptions</title>
 </head>
 <body bgcolor="#ffffff">
 <center>

 <h2>WAD: A Module for Converting Fatal Extension Errors into Python Exceptions</h2>
 <h6>David M. Beazley <br>
 Department of Computer Science<br>
 University of Chicago<br>
 Chicago, IL  60637<br>
 beazley@cs.uchicago.edu<br>
 </h6>
 </center>

 <h3>Abstract</h3>
 <em>
 One of the more popular uses of Python is as an extension language for
 applications written in compiled languages such as C, C++, and
 Fortran.  Unfortunately, one of the biggest drawbacks of this approach
 is the lack of a useful debugging and error handling facility for
 identifying problems in extension code. In part, this limitation is
 due to the fact that Python does not know anything about the internal
 implementation of an extension module.  A more difficult problem is
 that compiled extensions sometimes fail with catastrophic errors such
 as memory access violations, failed assertions, and floating point
 exceptions.  These types of errors fall outside the realm of normal
 Python exception handling and are particularly difficult to identify
 and debug.  Although traditional debuggers can find the location of a
 fatal error, they are unable to report the context in which such an
 error has occurred with respect to a Python script.  This paper describes
 an experimental system that converts fatal extension errors
 into Python exceptions.  In particular, a dynamically
 loadable module, WAD (Wrapped Application Debugger), has been developed which catches
 fatal errors, unwinds the call stack, and generates Python exceptions
 with debugging information.  WAD requires no modifications to Python,
 works with all extension modules, and introduces no performance
 overhead.  An initial implementation of the system is currently
 available for Sun SPARC Solaris and i386-Linux.

 </em>

 <h3>1. Introduction</h3>

 One of the primary reasons C, C++, and Fortran programmers are
 attracted to Python is its ability to serve as an extension language
 for compiled programs.  Furthermore, tools such as SIP, CXX, Pyfort, FPIG,
 and SWIG make it extremely easy for a programmer to ``wrap'' existing
 software into an extension module [1,2,3,4,5]. Although this approach is
 extremely attractive in terms of providing a highly usable and
 flexible environment for users, extension modules suffer from
 problems not normally associated with Python
 scripts---especially when they don't work.

 <p>
 Normally, Python programming errors result in an exception like this:

 <blockquote><pre>
 % python foo.py
 Traceback (innermost last):
   File "foo.py", line 11, in ?
     foo()
   File "foo.py", line 8, in foo
     bar()
   File "foo.py", line 5, in bar
     spam()
   File "foo.py", line 2, in spam
     doh()
 NameError: doh
 %
 </pre></blockquote>

 Unfortunately for compiled extensions, the following situation sometimes occurs:

 <blockquote><pre>
 % python foo.py
 Segmentation Fault (core dumped)
 %
 </pre></blockquote>

 Needless to say, this isn't very informative--well,
 other than indicating that something ``very bad'' happened.

 <p>
 In order to identify the source of a fatal error, a programmer can run a
 debugger on the Python executable or on a core file like this:

 <blockquote><pre>
 % gdb /usr/local/bin/python
 (gdb) run foo.py
 Starting program: /usr/local/bin/python foo.py

 Program received signal SIGSEGV, Segmentation fault.
 0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so
 (gdb) where
 #0  0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so
 #1  0xff082f34 in _wrap_doh ()
    from /u0/beazley/Projects/WAD/Python/./dohmodule.so
 #2  0x2777c in call_builtin (func=0x1984b8, arg=0x1a1ccc, kw=0x0)
     at ceval.c:2650
 #3  0x27648 in PyEval_CallObjectWithKeywords (func=0x1984b8, arg=0x1a1ccc,
     kw=0x0) at ceval.c:2618
 #4  0x25d18 in eval_code2 (co=0x19acf8, globals=0x0, locals=0x1c7844,
     args=0x1984b8, argcount=1625472, kws=0x0, kwcount=0, defs=0x0, defcount=0,
     owner=0x0) at ceval.c:1951
 #5  0x25954 in eval_code2 (co=0x199620, globals=0x0, locals=0x1984b8,
     args=0x196654, argcount=1862720, kws=0x197788, kwcount=0, defs=0x0,
 #6  0x25954 in eval_code2 (co=0x19ad38, globals=0x0, locals=0x196654,
     args=0x1962fc, argcount=1862800, kws=0x198e90, kwcount=0, defs=0x0,
     defcount=0, owner=0x0) at ceval.c:1850
 #7  0x25954 in eval_code2 (co=0x1b6c60, globals=0x0, locals=0x1962fc,
     args=0x1a1eb4, argcount=1862920, kws=0x0, kwcount=0, defs=0x0, defcount=0,
     owner=0x0) at ceval.c:1850
 #8  0x22da4 in PyEval_EvalCode (co=0x1b6c60, globals=0x1962c4, locals=0x1962c4)
     at ceval.c:319
 #9  0x3adb4 in run_node (n=0x18abf8, filename=0x1b6c60 "", globals=0x1962c4,
     locals=0x1962c4) at pythonrun.c:886
 #10 0x3ad64 in run_err_node (n=0x18abf8, filename=0x1b6c60 "",
     globals=0x1962c4, locals=0x1962c4) at pythonrun.c:874
 #11 0x3ad38 in PyRun_FileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
     start=1616888, globals=0x1962c4, locals=0x1962c4, closeit=1)
     at pythonrun.c:866
 #12 0x3a1d8 in PyRun_SimpleFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
     closeit=1) at pythonrun.c:579
 #13 0x39d84 in PyRun_AnyFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py",
     closeit=1) at pythonrun.c:459
 #14 0x1f498 in Py_Main (argc=2, argv=0xffbefc84) at main.c:289
 #15 0x1eec0 in main (argc=2, argv=0xffbefc84) at python.c:10
 </pre></blockquote>

 Unfortunately, even though the debugger identifies the location where the fault occurred, it
 mostly provides information about the internals of the
 interpreter.  The debugger certainly doesn't reveal anything about the Python
 program that led to the error (i.e., it doesn't reveal the
 same information that would be contained in a Python traceback).  As a result,
 the debugger is of limited use when it comes to debugging an application that
 consists of both compiled and Python code.

 <p>
 Normally, extension developers try to avoid catastrophic errors by
 adding error handling. If
 an application is small or customized for use with Python, it can be
 modified to raise Python exceptions.
 Automated tools such as SWIG can also convert C++
 exceptions and C-related error handling mechanisms into Python
 exceptions. However, no matter how much error checking is added,
 there is always a chance that an extension will fail in an unexpected
 manner.  This is especially true for large applications that have been wrapped
 into an extension module. In addition, certain types of errors such as floating
 point exceptions (e.g., division by zero) are especially difficult to find
 and eliminate. Finally, rigorous error checking may be omitted to improve
 performance.

 <p>
 To address these problems, an experimental module known as WAD (Wrapped
 Application Debugger) has been developed.
 WAD is able to
 convert fatal errors into Python exceptions that include information
 from the call stack as well as debugging
 information.  By turning such errors into Python exceptions, fatal
 errors now result in a traceback that crosses the boundary between
 Python code and compiled extension code.  This makes it much
 easier to identify and correct extension-related programming errors.
 WAD requires no modifications to Python and is compatible with all
 extension modules.  However, it is also highly platform specific
 and currently only runs on Sun Sparc
 Solaris and i386-Linux.  The primary goal of this paper is to motivate the problem
 and to describe one possible solution.  In addition, many of the
 implementation issues
 associated with providing an integrated error reporting mechanism are described.

 <h3>2. An Example</h3>

 WAD can either be imported as a Python extension module or linked to an
 extension module.  To illustrate, consider the earlier example:

 <blockquote><pre>
 % python foo.py
 Segmentation Fault (core dumped)
 %
 </pre></blockquote>

 To identify the problem, a programmer can run Python interactively and import WAD as follows:

 <blockquote><pre>
 % python
 Python 2.0 (#1, Oct 27 2000, 14:34:45)
 [GCC 2.95.2 19991024 (release)] on sunos5
 Type "copyright", "credits" or "license" for more information.
 >>> import libwadpy
 WAD Enabled
 >>> execfile("foo.py")
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "foo.py", line 16, in ?
     foo()
   File "foo.py", line 13, in foo
     bar()
   File "foo.py", line 10, in bar
     spam()
   File "foo.py", line 7, in spam
     doh.doh(a,b,c)
 SegFault: [ C stack trace ]

 #2   0x00027774 in call_builtin(func=0x1c74f0,arg=0x1a1ccc,kw=0x0)
 #1   0xff022f7c in _wrap_doh(0x0,0x1a1ccc,0x160ef4,0x9c,0x56b44,0x1aa3d8)
 #0   0xfe7e0568 in doh(a=0x3,b=0x4,c=0x0) in 'foo.c', line 28

 /u0/beazley/Projects/WAD/Python/foo.c, line 28

     int doh(int a, int b, int *c) {
  =>   *c = a + b;
       return *c;
     }

 >>>
 </pre></blockquote>

 In this case, we can
 see that the program has tried to assign a value to a
 NULL pointer (indicated by the value "c=0x0" in the last function call). Furthermore, we obtain a Python traceback that shows the
 entire sequence of functions leading to the problem.  Finally, since
 control returned to the interpreter, it is possible to interactively
 inspect various aspects of the application or to continue with the computation
 (although this clearly depends on the severity of the error and the nature of the application).

 <p>
 In certain applications, it may be difficult to run Python
 interactively or to modify the code to explicitly import a special
 debugging module.  In these cases, WAD can be attached to an extension module with the
 linker.  For example:

 <blockquote><pre>
 % ld -G $(OBJS) -o dohmodule.so -lwadpy
 </pre></blockquote>

 This requires no recompilation of any source code--only a relinking of the
 extension module.  When Python loads the relinked extension module, WAD is automatically
 initialized before Python invokes the module initialization function.

 <h3>3. Design Considerations for Embedded Error Recovery</h3>

 The primary design goal of WAD is provide an error reporting mechanism
 for extension modules that is a natural extension of normal Python
 exception handling.  There are two primary motivations for
 handling fatal errors in this manner: first, in the context of Python
 programming, it is simply unnatural to run a separate debugging
 application to identify a problem in an extension module when no such
 requirement exists for scripts.  Thus, an embedded error reporting
 mechanism is simply more convenient.  Second, the target users
 of an extension module may not know how to use a debugger or even have
 a development environment installed on their machine.  Therefore,
 the ability to produce an informative traceback within the
 confines of the Python interpreter can be of tremendous value to an
 extension developer.  This is because users who report a problem will
 be able to include an informative traceback as opposed to simply
 saying ``the code crashed.''

 <p>
 A secondary design goal is to provide a system that is as non-invasive
 as possible.  The system should not require modifications to Python or
 any extension modules and it should be easy to integrate
 into the runtime environment of an application. In addition, it shouldn't
 introduce any performance overhead.

 <p>
 Finally, since WAD co-exists with the Python interpreter (i.e., in the same
 process), there are a number of technical issues that have to be
 addressed.  First, fatal errors can theoretically occur anywhere in
 the interpreter as well as in extension modules.  Therefore, WAD needs
 to know about Python's internal organization if it is going to provide
 a graceful recovery back to the interpreter.  Second, in order to
 implement this recovery scheme, the system has to perform direct
 manipulation of the CPU context and call stack.  Last, but not least,
 since the recovery code lives in the same address space as the
 interpreter and extension modules it should not depend on the process
 stack and heap (since both could have been corrupted by the faulting
 application).

 <h3>4. Catching Fatal Errors</h3>

 WAD catches catastrophic errors by installing a
 reliable signal handler for SIGSEGV, SIGBUS, SIGABRT, SIGILL, and SIGFPE [9].  Unlike the
 more familiar BSD-style signal interface (as provided by the Python
 signal module), reliable signal handlers are installed using the <tt>sigaction()</tt> system call and have a few notable properties:

 <ul>
 <li> The signal handler can be configured to run on its own dedicated stack.

 <p>
 <li> Handler functions can receive a structure containing the CPU context
 including the CPU registers, program counter, and stack pointer.

 <p>
 <li> Changes to the CPU context take effect immediately after the signal handler returns.
 </ul>

 Therefore, the high level implementation of WAD is relatively straightforward:  when a fatal signal occurs,
 a handler function runs on an isolated signal handling stack.
 The CPU context is then used to unwind the call stack and to inspect the process state.  Finally,
 if possible, the CPU context is modified in a manner that allows the signal handler to
 return to Python with a raised exception.

 <h3>5. A Detailed Description of the Recovery Mechanism</h3>

 In this section, a more detailed description of the error recovery
 scheme is presented.  The precise implementation details of this are
 highly platform specific and involve a number of advanced topics including
 the Unix process file system (/proc), the ELF object file format, and the
 Stabs compiler debugging format [6,7,8]. The details of these topics are
 beyond the scope of this paper.  However, this section hopes to
 give the reader a small taste of the steps involved in implementing the recovery mechanism.

 <P>
 The services of WAD are only invoked upon the reception of a fatal
 signal. This triggers a signal handling function that results in a return to Python
 as illustrated in the following figure:

 <center>
 <img src="fig1.png">
 <h6>Control flow of the error recovery mechanism</h6>
 </center>

 <p>
 The steps required to implement this recovery are as follows:

 <ol>
 <li>  The values of the program counter and stack pointer are obtained from the CPU
        context structure passed to the WAD signal handler.

 <p>
 <li> The virtual memory map of the process is inspected to identify all of
 the shared libraries, dynamically loaded modules, and valid memory regions.
 This information is obtained by reading from the Unix /proc filesystem.
 The following table illustrates the nature of this data:


 <blockquote><pre>
 Address     Size    Permissions        File
 ----------  -----   -----------------  ---------------------------------
 00010000    1264K   read/exec         /usr/local/bin/python
 0015A000     184K   read/write/exec   /usr/local/bin/python
 00188000     296K   read/write/exec     [ heap ]
 FE7C0000      32K   read/exec         /u0/beazley/Projects/dohmodule.so
 FE7D6000       8K   read/write/exec   /u0/beazley/Projects/dohmodule.so
 ...
 FF100000     664K   read/exec         /usr/lib/libc.so.1
 FF1B6000      24K   read/write/exec   /usr/lib/libc.so.1
 FF1BC000       8K   read/write/exec   /usr/lib/libc.so.1
 FF2C0000     120K   read/exec         /usr/lib/libthread.so.1
 FF2EE000       8K   read/write/exec   /usr/lib/libthread.so.1
 FF2F0000      48K   read/write/exec   /usr/lib/libthread.so.1
 FF310000      40K   read/exec         /usr/lib/libsocket.so.1
 FF32A000       8K   read/write/exec   /usr/lib/libsocket.so.1
 FF330000      24K   read/exec         /usr/lib/libpthread.so.1
 FF346000       8K   read/write/exec   /usr/lib/libpthread.so.1
 FF350000       8K   read/write/exec    [ anon ]
 FF3B0000       8K   read/exec         /usr/lib/libdl.so.1
 FF3C0000     128K   read/exec         /usr/lib/ld.so.1
 FF3E0000       8K   read/write/exec   /usr/lib/ld.so.1
 FFBEA000      24K   read/write/exec    [ stack ]
 </pre></blockquote>

 <p>
 <li>  The call stack is unwound to produce a traceback of the
 calling sequence that led to the error.  The unwinding process is just a simple
 loop that is similar to the following:

 <blockquote><pre>
 long *pc = get_pc(context);
 long *sp = get_sp(context);
 while (sp) {
     /* Move to previous stack frame */
     pc = (long *) sp[15];      /* %i7 register on SPARC */
     sp = (long *) sp[14];      /* %i6 register on SPARC */
 }
 </pre></blockquote>

 <li> For each stack frame, symbol table and debugging information
 is gathered and stored in a WAD exception frame object.
 Obtaining this information is the most complicated part of WAD and involves
 the following steps: first, the current program counter is mapped to an object file
 using the virtual memory map obtained in step 2.  Next, the object file is loaded
 using mmap().  Once loaded, the ELF symbol table
 is searched for an address match.  The symbol table contains a collection of records
 containing memory offsets, sizes, and names such as this:

 <blockquote><pre>
 Offset    Size    Name
 --------  ------  ---------
 0x1280    324     wrap_foo
 0x1600    128     foo
 0x2408    192     bar
 ...
 </pre></blockquote>

 To find a match for a virtual memory address <em>addr</em>, WAD simply
 searches for a symbol <em>s</em> such that <em>base</em> +
 <em>s</em>.offset &lt;= <em>addr</em> &lt <em>base</em> +
 <em>s</em>.offset + <em>s</em>.size, where <em>base</em> is the base
 virtual address of the object file in the virtual memory map.

 <p>
 Debugging information, if available, is scanned to identify a source
 file, function name, and line number.  This involves scanning object files for a
 table of debugging information stored in a format
 known as ``stabs.''. Stabs is a relatively simple, but highly extensible format that
 is language independent and capable of encoding almost every aspect of the
 original source code.   For the purposes of WAD, only a small subset of this
 data is actually used.

 <p>
 The following table shows a small fragment of relevant stabs data:
 <blockquote><pre>
 type    desc   value        string                        description
 ------  -----  ---------    ---------------------------   -----------
 0x64      0        0         /u0/beazley/Projects/foo/    Pathname
 0x64      0        0        foo.c                         Filename
 ...
 0x24      0        0        foo:F(0,3);(0,3)              Function
 0xa0      4        68       n:p(0,3)                      Parameter
 ...
 0x44      6        8                                      Line number
 0x44      7        12                                     Line number
 0x44      8        44                                     Line number
 0x44      9        56                                     Line number
 ...
 </pre></blockquote>

 In the table, the type field indicates the type of debugging information.  For
 example, 0x64 specifies the source file, 0x24 is a function
 definition, 0xa0 is a function parameter, and 0x44 is line number
 information. Associated with each stab is a collection of parameters
 and an optional string.  The string usually contains symbol names and
 other information.  The <tt>desc</tt> and <tt>value</tt> fields are numbers
 that usually contain byte offsets and line number data.
 Therefore, to collect debugging information, WAD simply walks through the debugging
 tables until it finds the function of interest.  Once found, parameter and line
 number specifiers are inspected to determine the location and values of the function
 arguments as well the source line at which the error occurred.

 <p>
 <li> After the complete traceback has been obtained, it is examined to see if
 there are any ``safe'' return points to which control can be returned.
 This is accomplished by maintaining an internal table of predefined symbolic return
 points as shown in the following table:

 <blockquote><pre>
 Python symbol                     Return value
 -----------------------------     ------------------
 call_builtin                      NULL
 _PyImport_LoadDynamicModule       NULL
 PyObject_Repr                     NULL
 PyObject_Print                    -1
 PyObject_CallFunction             NULL
 PyObject_CallMethod               NULL
 PyObject_CallObject               NULL
 PyObject_Cmp                      -1
 PyObject_Compare                  -1
 PyObject_DelAttrString            -1
 PyObject_DelItem                  -1
 PyObject_GetAttrString            NULL
 PyObject_GetItem                  NULL
 PyObject_HasAttrString            -1
 PyObject_Hash                     -1
 PyObject_Length                   -1
 PyObject_SetAttrString            -1
 PyObject_SetItem                  -1
 PyObject_Str                      NULL
 PyObject_Type                     NULL
 ...
 PyEval_EvalCode                   NULL
 </pre></blockquote>

 The symbols in this table correspond to functions within the Python interpreter that
 might execute extension code and include the parts of the interpreter that invoke builtin functions
 as well as the functions from the abstract object interface.
 If any of these symbols appear on the call stack,
 a handler function is invoked to raise a Python exception.
 This handler function
 is given a WAD-specific traceback object that contains a copy of the
 call stack and CPU registers as well as any symbolic and debugging
 information that was obtained.  If none of the symbolic return points
 are encountered, WAD invokes a default handler that simply prints the
 full C stack trace and generates a core file.

 <P>
 <li>  If a return point is found, the CPU context is modified in a manner that allows the signal handler to return
        with a suitable Python error.
        This last step is the most tricky part of the recovery process, but the general
        idea is that CPU context is modified in a way that makes Python think that
        an extension function simply raised an exception and returned an error.  Currently, this
        is implemented by having the signal handler return to a small
        handler function written in assembly language which arranges to return the
        desired value back to the specified return point.

 <p>
        The most complicated part of modifying the CPU context is that of restoring
        previously saved CPU registers.   By manually unwinding the call stack, the
        WAD exception handler effectively performs the same operation as a longjmp() call in C.
        However, unlike longjmp(), no previously saved set of CPU registers are available from which to resume
        execution in the Python interpreter.  The solution to this problem depends entirely on the
        underlying architecture.  On the SPARC, register values are saved in register windows
        which WAD manually unwinds to restore the proper state. On the Intel, the solution is much
        more interesting.  To restore the register values, WAD must manually inspect the
        machine instructions of each function on the call stack in order to find out where the
        registers might have been saved. This information is then used to restore the registers from their
        saved locations before returning to the Python interpreter.

 <p>
 <li> Python receives the exception and produces a traceback.
 </ol>

 <h3>6. Initialization and Loading</h3>

 In the earlier example, it was shown that WAD could be both
 loaded as an extension module or simply attached to an existing module
 with the linker.  This latter case is implemented by
 wrapping the WAD initialization function inside the constructor of a
 statically allocated C++ object like this:

 <blockquote>
 <pre>
 class WadInit {
 public:
     WadInit() {
         wad_init();   /* Call the real initialization function */
     }
 };
 static WadInit wad_initializer;
 </pre></blockquote>

 When the dynamic loader brings WAD into memory, it automatically
 executes the constructors of all statically allocated C++ objects.
 Therefore, this initialization code executes immediately after
 loading, but before Python actually calls the module initialization
 function.  As a result, when an extension module is linked with WAD,
 the debugging capability is enabled before any other operations occur---this
 allows WAD to respond to fatal errors that might occur during module
 initialization.

 The rest of the initialization process consists of the following:
 <ul>
 <li> The WAD signal handler is installed.
 <li> A collection of return symbols are registered with the signal handler (see the previous section).
 <li> Four new Python exception objects <tt>SegFault</tt>, <tt>BusError</tt>, <tt>AbortError</tt>,
 and <tt>IllegalInstruction</tt> are added
 to the <tt>__builtin__</tt> module.
 </ul>

 Although the use of a C++ static constructor has the potential to
 conflict with C++ extension code that also uses static constructors,
 it is always possible to enable WAD prior to loading a C++ extension
 (e.g., WAD could be loaded separately).

 <h3>7. Implementation Details</h3>

 Currently, WAD is written in ANSI C with a small amount of C++,
 and a small amount of assembly code (to assist in the return to the interpreter).
 The entire implementation contains approximately 2000 semicolons and most of the code
 relates to the gathering of source code information (symbol tables,
 debugging information, etc.).

 <p>
 Although there are libraries such as GNU bfd that can assist with the
 reading of object files, none of these are used in the implementation [10].
 First, these libraries tend to be quite large
 and are oriented more towards stand-alone tools such as debuggers,
 linkers, and compilers.  Second, due to usual nature of the runtime
 environment and the restrictions on memory utilization (no heap, no
 stack), the behavior of these libraries is somewhat unclear and
 would require further study.
 Finally, given the small size of the prototype implementation, it didn't seem necessary to rely on a
 large general purpose library.

 <h3>8. Discussion</h3>

 The primary focus of this work is to provide a more useful error
 reporting mechanism to extension developers.
 However, this does not imply that
 WAD is appropriate as a general purpose exception
 handling mechanism.  First, let's focus
 on the recovery mechanism:

 <ul>
 <li> When WAD unwinds the call stack, objects allocated on the stack
 are lost.  This may interact poorly with C++ extensions since the
 unwinding process does not invoke C++ destructors.  It may be possible to fix
 this problem, but doing so would require coordination with the C++ runtime library.

 <p>
 <li> Similarly, if a procedure allocates objects on the heap, stack unwinding
 may cause those objects to never be reclaimed.

 <p>
 <li> Closely related to heap management, stack unwinding may result in
 open files, sockets, and other system resources.  Furthermore, in a multithreaded
 environment, deadlock may occur if a procedure is holding a lock when an error occurs.

 <p>
 <li> An application may fail by overwriting the process heap and corrupting
 memory.  Although WAD can produce internal diagnostics even when the heap has been
 destroyed, Python may fail immediately upon return from the
 WAD signal handler or shortly thereafter.

 <p>
 <li> If an application destroys the call stack (via buffer overflow), WAD will
 be unable to complete a stack trace and will be unable to return to
 Python.

 <p>
 <li> Memory management problems such as double-freeing of memory are particularly
 difficult to identify.  If an extension module corrupts the memory allocator
 in some manner, this may cause Python to fail in a completely unexpected location.
 WAD is usually able to produce a traceback in this situation, but
 it may not correspond to the real source of the problem.

 </ul>

 In addition, there are a number of issues that pertain to WAD's interaction with the
 Python interpreter:

 <ul>
 <li> The recovery mechanism is entirely based on symbolic information stored
 in the Python executable.  Therefore, the return points are simply specified
 as strings such as ``call_builtin'' as opposed to real memory addresses.
 Because of this, WAD is compatible with essentially any version of Python (provided
 it supports class-based exceptions).

 <P>
 <li> WAD is unable to manage multiple return values to same procedure.
 For example, Python's <tt>eval_code2()</tt> procedure contains a huge
 case statement for executing byte codes.  Within this procedure, certain
 function calls return NULL to indicate an error and others return -1.  Since WAD
 is unable to determine which value to return, this particular procedure does not make a very
 good return point for error recovery.

 <P>
 <li> An alternative approach to the symbolic recovery scheme would be to
 instrument Python with a collection of safe return points using setjmp()/longjmp().
 This approach is not used because it would require a significant number of changes to
 the interpreter and it would introduce an unacceptable amount of performance overhead.

 <p>
 <li> WAD is generally safe to use with Python threads.  However, if a
 compiled extension function manually releases the Python interpreter
 lock and subsequently faults, the return behavior is unspecified.  In
 the future, it may be possible to use the interpreter lock to provide coordination
 between the interpreter and the error recovery mechanism.

 <p>
 <li> Compiled extension code may perform an eval operation in which Python code is executed
 in the interpreter.  This results in a situation where the complete call-stack of an
 application crosses the boundary between Python and C several times.  WAD can
 still handle faults in this setting as long as an application is doing a reasonable amount of
 error checking.  For example, a fatal error that occurs inside an eval operation could
 be caught by the extension code and propagated further up the call stack.

 <p>
 <li> In certain cases, Python may be configured to handle the SIGFPE signal for floating point
 exceptions. The default Python handling of this error is to abort and dump core. However,
 with WAD, a complete stack traceback will be obtained when a SIGFPE occurs.

 <p>
 <li> WAD is extremely inefficient.  Due to restrictions on the heap and stack,
 WAD relies heavily on mmap() and a variety of other file
 operations as it handles errors.  It also performs linear searches of symbol and
 debugging tables. As a result, WAD's generation of a
 Python exception is several orders of magnitude slower than an ordinary
 exception.
 </ul>

 Finally, there are a number of application specific issues to note:

 <ul>
 <li> Aggressive compiler optimization techniques may prevent WAD from
 accurately reporting locations within the original source code.
 This is particularly problematic with numerical applications where
 techniques such procedure inlining can make it impossible to obtain accurate
 debugging information.  Since these types of problems also arise in
 full-featured debuggers, it is unlikely that they can be easily fixed in WAD (at least not
 without a considerable amount of work).

 <p>
 <li> If an application implements its own exception handling,
 it may provide Python with less information than what would obtained with WAD.
 For example, a programmer might implement a function like this:

 <blockquote><pre>
 void *Malloc(int size) {
    void *ptr;
    ptr = malloc(size);
    if (!ptr) throw("Out of memory");
    return ptr;
 }
 </pre></blockquote>

 In this case, the ``throw'' function may initiate an internal
 exception handling mechanism that relies upon setjmp/longjmp or C++ exceptions.
 When the error eventually makes it back to the interpreter, the user will get an ``out
 of memory'' exception, but no additional information will be
 provided.  In contrast, if the programmer simply used an <tt>assert()</tt> statement, WAD would produce a full stack trace leading to
 the error.
 </ul>


 Despite its various limitations, WAD is applicable to a wide range of
 extension-related errors.  Furthermore, most of the errors that are
 likely to occur are of a more benign variety.  For example, a
 segmentation fault may simply be the result of an uninitialized
 pointer (perhaps the user forgot to call an initialization procedure).
 Likewise, bus errors, failed assertions, and floating point exceptions
 rarely result in a situation where the WAD recovery mechanism would be
 unable to produce a meaningful Python traceback.

 <h3>9. Related Work</h3>

 There is a huge body of literature concerning the implementation of
 exception handling in various programming languages and environments.
 A detailed discussion of this work is clearly not possible here, but
 a general overview of various exception handling issues can be found in [11].
 In general, there are a few themes that seem to prevail.
 First,
 considerable attention has been given to exception handling mechanisms
 in specific languages such as efficient exception handling for C++.
 Second, a great deal of work has been given to the semantic aspects of
 exception handling such as exception hierarchies, finalization, and
 whether or not code is restartable after an exception has occurred.
 Finally, a fair amount of exception work has been done in the context
 of component frameworks and distributed systems.  Most of this work
 tends to concentrate on explicit exception handling mechanisms.  Very little
 work appears to have been done in the area of converting hardware generated errors
 into exceptions.

 <p>
 With respect to debuggers, quite a lot of work has been done in
 creating advanced debugging support for specific languages and
 integrated development environments.  However, very little of this work
 has concentrated on the problem of extensible systems and
 compiled-interpreted language integration.  For instance, debuggers
 for Python are currently unable to cross over into C extensions whereas C
 debuggers aren't able to easily extract useful information from the
 internals of the Python interpreter.

 <p>
 One system of possible interest is Rn which was developed in the
 mid-1980s at Rice University [12]. This system, primarily
 designed for working with large scientific applications written in
 Fortran, provided an execution monitor that consisted of a special
 debugging process with an embedded interpreter. When attached to
 compiled Fortran code, this monitor could dynamically patch
 the executable in a manner that allowed parts of the code to be executed in the
 interpreter. This was used to provide a debugging environment in which
 essentially any part of the compiled application could be modified at
 run-time by simply compiling the modified code (Fortran) to an
 interpreted form and inserting a breakpoint in the original executable
 that transferred control to the interpreter.  Although this
 particular scheme is not directly related to the functionality
 of WAD, it is one of the few systems in which
 interpreted and compiled code have been tightly coupled within
 a debugging framework.  Several aspects of the interpreted/compiled
 interface are closely related to way in which WAD operates.  In addition,
 various aspects of this work may be useful should WAD be extended with
 new capabilities.

 <h3>10. Future Directions</h3>

 WAD is currently an experimental prototype.  Although this paper has
 described its use with Python, the core of the system is generic and
 is easily extended to other programming environments.  For example, when
 linked to C/C++ code, WAD will automatically produce stack
 traces for fatal errors.  A module for generating Tcl exceptions has
 also been developed.  Plans are underway to provide support for other
 extensible systems including Perl, Ruby, and Guile.

 <p>
 Finally, a number of extensions to the WAD approach may be possible.
 For example, even though the current implementation only returns a
 traceback string to the Python interpreter, the WAD signal handler
 actually generates a full traceback of the C call stack including all
 of the CPU registers and a copy of the stack data.  Therefore, with a
 little work, it may be possible to implement a diagnostic tool that
 allows the state of the C stack to be inspected from the Python
 interpreter after a crash has occurred.  Similarly, it may be possible
 to integrate the capabilities of WAD with those provided by the Python
 debugger.

 <h3>11. Conclusions and Availability</h3>

 WAD provides a simple mechanism for converting fatal errors into
 Python exceptions that provide useful information to extension
 writers.  In doing so, it solves one of the most frustrating aspects
 of working with compiled Python extensions--that of identifying program errors.
 Furthermore the system requires no code modifications to Python and introduces
 no performance overhead.
 Although the system is
 necessarily platform specific, the system does not involve a
 significant amount of code.  As a result, it may be relatively
 straightforward to port to other Unix systems.

 <p>
 As of this writing, WAD is still undergoing active development.   However,
 the software is available for experimentation and download at
 at <tt>http://systems.cs.uchicago.edu/wad</tt>.

 <h3>References</h3>

 [1] D.M. Beazley, <em>Using SWIG to Control, Prototype, and Debug C Programs with Python</em>,
 4th International Python Conference, Livermore, CA. (1996).

 <p>
 [2] P.F. Dubois, <em>Climate Data Analysis Software</em>, 8th International Python Conference,
 Arlington, VA. (2000).

 <p>
 [3] P.F. Dubois, <em>A Facility for Creating Python Extensions in C++</em>, 7th International Python
 Conference, Houston, TX. (1998).

 <p>
 [4] SIP. <tt>http://www.thekompany.com/projects/pykde/</tt>.

 <p>
 [5] FPIG. <tt>http://cens.ioc.ee/projects/f2py2e/</tt>.

 <p>
 [6] R. Faulkner and R. Gomes, <em>The Process File System and Process Model in UNIX System V</em>, USENIX Conference Proceedings,
 January 1991.

 <p>
 [7] J.R. Levine, <em>Linkers &amp; Loaders.</em> Morgan Kaufmann Publishers, 2000.

 <p>
 [8] Free Software Foundation, <em>The "stabs" debugging format</em>. GNU info document.

 <p>
 [9] W. Richard Stevens, <em>UNIX Network Programming: Interprocess Communication, Volume 2</em>. PTR
 Prentice-Hall, 1998.

 <p>
 [10] S. Chamberlain. <em>libbfd: The Binary File Descriptor Library</em>. Cygnus Support, bfd version 3.0 edition, April 1991.

 <p>
 [11] M.L. Scott. <em>Programming Languages Pragmatics</em>. Morgan Kaufmann Publishers, 2000.

 <p>
 [12] A. Carle, D. Cooper, R. Hood, K. Kennedy, L. Torczon, S. Warren, <em>A Practical Environment for Scientific Programming.</em>
 IEEE Computer, Vol 20, No. 11, (1987). p. 75-89.


 </body>
 </html>