| <html> |
| <head> |
| <title>WAD: A Module for Converting Fatal Extension Errors into Python Exceptions</title> |
| </head> |
| <body bgcolor="#ffffff"> |
| <center> |
| |
| <h2>WAD: A Module for Converting Fatal Extension Errors into Python Exceptions</h2> |
| <h6>David M. Beazley <br> |
| Department of Computer Science<br> |
| University of Chicago<br> |
| Chicago, IL 60637<br> |
| beazley@cs.uchicago.edu<br> |
| </h6> |
| </center> |
| |
| <h3>Abstract</h3> |
| <em> |
| One of the more popular uses of Python is as an extension language for |
| applications written in compiled languages such as C, C++, and |
| Fortran. Unfortunately, one of the biggest drawbacks of this approach |
| is the lack of a useful debugging and error handling facility for |
| identifying problems in extension code. In part, this limitation is |
| due to the fact that Python does not know anything about the internal |
| implementation of an extension module. A more difficult problem is |
| that compiled extensions sometimes fail with catastrophic errors such |
| as memory access violations, failed assertions, and floating point |
| exceptions. These types of errors fall outside the realm of normal |
| Python exception handling and are particularly difficult to identify |
| and debug. Although traditional debuggers can find the location of a |
| fatal error, they are unable to report the context in which such an |
| error has occurred with respect to a Python script. This paper describes |
| an experimental system that converts fatal extension errors |
| into Python exceptions. In particular, a dynamically |
| loadable module, WAD (Wrapped Application Debugger), has been developed which catches |
| fatal errors, unwinds the call stack, and generates Python exceptions |
| with debugging information. WAD requires no modifications to Python, |
| works with all extension modules, and introduces no performance |
| overhead. An initial implementation of the system is currently |
| available for Sun SPARC Solaris and i386-Linux. |
| |
| </em> |
| |
| <h3>1. Introduction</h3> |
| |
| One of the primary reasons C, C++, and Fortran programmers are |
| attracted to Python is its ability to serve as an extension language |
| for compiled programs. Furthermore, tools such as SIP, CXX, Pyfort, FPIG, |
| and SWIG make it extremely easy for a programmer to ``wrap'' existing |
| software into an extension module [1,2,3,4,5]. Although this approach is |
| extremely attractive in terms of providing a highly usable and |
| flexible environment for users, extension modules suffer from |
| problems not normally associated with Python |
| scripts---especially when they don't work. |
| |
| <p> |
| Normally, Python programming errors result in an exception like this: |
| |
| <blockquote><pre> |
| % python foo.py |
| Traceback (innermost last): |
| File "foo.py", line 11, in ? |
| foo() |
| File "foo.py", line 8, in foo |
| bar() |
| File "foo.py", line 5, in bar |
| spam() |
| File "foo.py", line 2, in spam |
| doh() |
| NameError: doh |
| % |
| </pre></blockquote> |
| |
| Unfortunately for compiled extensions, the following situation sometimes occurs: |
| |
| <blockquote><pre> |
| % python foo.py |
| Segmentation Fault (core dumped) |
| % |
| </pre></blockquote> |
| |
| Needless to say, this isn't very informative--well, |
| other than indicating that something ``very bad'' happened. |
| |
| <p> |
| In order to identify the source of a fatal error, a programmer can run a |
| debugger on the Python executable or on a core file like this: |
| |
| <blockquote><pre> |
| % gdb /usr/local/bin/python |
| (gdb) run foo.py |
| Starting program: /usr/local/bin/python foo.py |
| |
| Program received signal SIGSEGV, Segmentation fault. |
| 0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so |
| (gdb) where |
| #0 0xff060568 in doh () from /u0/beazley/Projects/WAD/Python/./libfoo.so |
| #1 0xff082f34 in _wrap_doh () |
| from /u0/beazley/Projects/WAD/Python/./dohmodule.so |
| #2 0x2777c in call_builtin (func=0x1984b8, arg=0x1a1ccc, kw=0x0) |
| at ceval.c:2650 |
| #3 0x27648 in PyEval_CallObjectWithKeywords (func=0x1984b8, arg=0x1a1ccc, |
| kw=0x0) at ceval.c:2618 |
| #4 0x25d18 in eval_code2 (co=0x19acf8, globals=0x0, locals=0x1c7844, |
| args=0x1984b8, argcount=1625472, kws=0x0, kwcount=0, defs=0x0, defcount=0, |
| owner=0x0) at ceval.c:1951 |
| #5 0x25954 in eval_code2 (co=0x199620, globals=0x0, locals=0x1984b8, |
| args=0x196654, argcount=1862720, kws=0x197788, kwcount=0, defs=0x0, |
| #6 0x25954 in eval_code2 (co=0x19ad38, globals=0x0, locals=0x196654, |
| args=0x1962fc, argcount=1862800, kws=0x198e90, kwcount=0, defs=0x0, |
| defcount=0, owner=0x0) at ceval.c:1850 |
| #7 0x25954 in eval_code2 (co=0x1b6c60, globals=0x0, locals=0x1962fc, |
| args=0x1a1eb4, argcount=1862920, kws=0x0, kwcount=0, defs=0x0, defcount=0, |
| owner=0x0) at ceval.c:1850 |
| #8 0x22da4 in PyEval_EvalCode (co=0x1b6c60, globals=0x1962c4, locals=0x1962c4) |
| at ceval.c:319 |
| #9 0x3adb4 in run_node (n=0x18abf8, filename=0x1b6c60 "", globals=0x1962c4, |
| locals=0x1962c4) at pythonrun.c:886 |
| #10 0x3ad64 in run_err_node (n=0x18abf8, filename=0x1b6c60 "", |
| globals=0x1962c4, locals=0x1962c4) at pythonrun.c:874 |
| #11 0x3ad38 in PyRun_FileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py", |
| start=1616888, globals=0x1962c4, locals=0x1962c4, closeit=1) |
| at pythonrun.c:866 |
| #12 0x3a1d8 in PyRun_SimpleFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py", |
| closeit=1) at pythonrun.c:579 |
| #13 0x39d84 in PyRun_AnyFileEx (fp=0x18a5a8, filename=0xffbefd7a "foo.py", |
| closeit=1) at pythonrun.c:459 |
| #14 0x1f498 in Py_Main (argc=2, argv=0xffbefc84) at main.c:289 |
| #15 0x1eec0 in main (argc=2, argv=0xffbefc84) at python.c:10 |
| </pre></blockquote> |
| |
| Unfortunately, even though the debugger identifies the location where the fault occurred, it |
| mostly provides information about the internals of the |
| interpreter. The debugger certainly doesn't reveal anything about the Python |
| program that led to the error (i.e., it doesn't reveal the |
| same information that would be contained in a Python traceback). As a result, |
| the debugger is of limited use when it comes to debugging an application that |
| consists of both compiled and Python code. |
| |
| <p> |
| Normally, extension developers try to avoid catastrophic errors by |
| adding error handling. If |
| an application is small or customized for use with Python, it can be |
| modified to raise Python exceptions. |
| Automated tools such as SWIG can also convert C++ |
| exceptions and C-related error handling mechanisms into Python |
| exceptions. However, no matter how much error checking is added, |
| there is always a chance that an extension will fail in an unexpected |
| manner. This is especially true for large applications that have been wrapped |
| into an extension module. In addition, certain types of errors such as floating |
| point exceptions (e.g., division by zero) are especially difficult to find |
| and eliminate. Finally, rigorous error checking may be omitted to improve |
| performance. |
| |
| <p> |
| To address these problems, an experimental module known as WAD (Wrapped |
| Application Debugger) has been developed. |
| WAD is able to |
| convert fatal errors into Python exceptions that include information |
| from the call stack as well as debugging |
| information. By turning such errors into Python exceptions, fatal |
| errors now result in a traceback that crosses the boundary between |
| Python code and compiled extension code. This makes it much |
| easier to identify and correct extension-related programming errors. |
| WAD requires no modifications to Python and is compatible with all |
| extension modules. However, it is also highly platform specific |
| and currently only runs on Sun Sparc |
| Solaris and i386-Linux. The primary goal of this paper is to motivate the problem |
| and to describe one possible solution. In addition, many of the |
| implementation issues |
| associated with providing an integrated error reporting mechanism are described. |
| |
| <h3>2. An Example</h3> |
| |
| WAD can either be imported as a Python extension module or linked to an |
| extension module. To illustrate, consider the earlier example: |
| |
| <blockquote><pre> |
| % python foo.py |
| Segmentation Fault (core dumped) |
| % |
| </pre></blockquote> |
| |
| To identify the problem, a programmer can run Python interactively and import WAD as follows: |
| |
| <blockquote><pre> |
| % python |
| Python 2.0 (#1, Oct 27 2000, 14:34:45) |
| [GCC 2.95.2 19991024 (release)] on sunos5 |
| Type "copyright", "credits" or "license" for more information. |
| >>> import libwadpy |
| WAD Enabled |
| >>> execfile("foo.py") |
| Traceback (most recent call last): |
| File "<stdin>", line 1, in ? |
| File "foo.py", line 16, in ? |
| foo() |
| File "foo.py", line 13, in foo |
| bar() |
| File "foo.py", line 10, in bar |
| spam() |
| File "foo.py", line 7, in spam |
| doh.doh(a,b,c) |
| SegFault: [ C stack trace ] |
| |
| #2 0x00027774 in call_builtin(func=0x1c74f0,arg=0x1a1ccc,kw=0x0) |
| #1 0xff022f7c in _wrap_doh(0x0,0x1a1ccc,0x160ef4,0x9c,0x56b44,0x1aa3d8) |
| #0 0xfe7e0568 in doh(a=0x3,b=0x4,c=0x0) in 'foo.c', line 28 |
| |
| /u0/beazley/Projects/WAD/Python/foo.c, line 28 |
| |
| int doh(int a, int b, int *c) { |
| => *c = a + b; |
| return *c; |
| } |
| |
| >>> |
| </pre></blockquote> |
| |
| In this case, we can |
| see that the program has tried to assign a value to a |
| NULL pointer (indicated by the value "c=0x0" in the last function call). Furthermore, we obtain a Python traceback that shows the |
| entire sequence of functions leading to the problem. Finally, since |
| control returned to the interpreter, it is possible to interactively |
| inspect various aspects of the application or to continue with the computation |
| (although this clearly depends on the severity of the error and the nature of the application). |
| |
| <p> |
| In certain applications, it may be difficult to run Python |
| interactively or to modify the code to explicitly import a special |
| debugging module. In these cases, WAD can be attached to an extension module with the |
| linker. For example: |
| |
| <blockquote><pre> |
| % ld -G $(OBJS) -o dohmodule.so -lwadpy |
| </pre></blockquote> |
| |
| This requires no recompilation of any source code--only a relinking of the |
| extension module. When Python loads the relinked extension module, WAD is automatically |
| initialized before Python invokes the module initialization function. |
| |
| <h3>3. Design Considerations for Embedded Error Recovery</h3> |
| |
| The primary design goal of WAD is provide an error reporting mechanism |
| for extension modules that is a natural extension of normal Python |
| exception handling. There are two primary motivations for |
| handling fatal errors in this manner: first, in the context of Python |
| programming, it is simply unnatural to run a separate debugging |
| application to identify a problem in an extension module when no such |
| requirement exists for scripts. Thus, an embedded error reporting |
| mechanism is simply more convenient. Second, the target users |
| of an extension module may not know how to use a debugger or even have |
| a development environment installed on their machine. Therefore, |
| the ability to produce an informative traceback within the |
| confines of the Python interpreter can be of tremendous value to an |
| extension developer. This is because users who report a problem will |
| be able to include an informative traceback as opposed to simply |
| saying ``the code crashed.'' |
| |
| <p> |
| A secondary design goal is to provide a system that is as non-invasive |
| as possible. The system should not require modifications to Python or |
| any extension modules and it should be easy to integrate |
| into the runtime environment of an application. In addition, it shouldn't |
| introduce any performance overhead. |
| |
| <p> |
| Finally, since WAD co-exists with the Python interpreter (i.e., in the same |
| process), there are a number of technical issues that have to be |
| addressed. First, fatal errors can theoretically occur anywhere in |
| the interpreter as well as in extension modules. Therefore, WAD needs |
| to know about Python's internal organization if it is going to provide |
| a graceful recovery back to the interpreter. Second, in order to |
| implement this recovery scheme, the system has to perform direct |
| manipulation of the CPU context and call stack. Last, but not least, |
| since the recovery code lives in the same address space as the |
| interpreter and extension modules it should not depend on the process |
| stack and heap (since both could have been corrupted by the faulting |
| application). |
| |
| <h3>4. Catching Fatal Errors</h3> |
| |
| WAD catches catastrophic errors by installing a |
| reliable signal handler for SIGSEGV, SIGBUS, SIGABRT, SIGILL, and SIGFPE [9]. Unlike the |
| more familiar BSD-style signal interface (as provided by the Python |
| signal module), reliable signal handlers are installed using the <tt>sigaction()</tt> system call and have a few notable properties: |
| |
| <ul> |
| <li> The signal handler can be configured to run on its own dedicated stack. |
| |
| <p> |
| <li> Handler functions can receive a structure containing the CPU context |
| including the CPU registers, program counter, and stack pointer. |
| |
| <p> |
| <li> Changes to the CPU context take effect immediately after the signal handler returns. |
| </ul> |
| |
| Therefore, the high level implementation of WAD is relatively straightforward: when a fatal signal occurs, |
| a handler function runs on an isolated signal handling stack. |
| The CPU context is then used to unwind the call stack and to inspect the process state. Finally, |
| if possible, the CPU context is modified in a manner that allows the signal handler to |
| return to Python with a raised exception. |
| |
| <h3>5. A Detailed Description of the Recovery Mechanism</h3> |
| |
| In this section, a more detailed description of the error recovery |
| scheme is presented. The precise implementation details of this are |
| highly platform specific and involve a number of advanced topics including |
| the Unix process file system (/proc), the ELF object file format, and the |
| Stabs compiler debugging format [6,7,8]. The details of these topics are |
| beyond the scope of this paper. However, this section hopes to |
| give the reader a small taste of the steps involved in implementing the recovery mechanism. |
| |
| <P> |
| The services of WAD are only invoked upon the reception of a fatal |
| signal. This triggers a signal handling function that results in a return to Python |
| as illustrated in the following figure: |
| |
| <center> |
| <img src="fig1.png"> |
| <h6>Control flow of the error recovery mechanism</h6> |
| </center> |
| |
| <p> |
| The steps required to implement this recovery are as follows: |
| |
| <ol> |
| <li> The values of the program counter and stack pointer are obtained from the CPU |
| context structure passed to the WAD signal handler. |
| |
| <p> |
| <li> The virtual memory map of the process is inspected to identify all of |
| the shared libraries, dynamically loaded modules, and valid memory regions. |
| This information is obtained by reading from the Unix /proc filesystem. |
| The following table illustrates the nature of this data: |
| |
| |
| <blockquote><pre> |
| Address Size Permissions File |
| ---------- ----- ----------------- --------------------------------- |
| 00010000 1264K read/exec /usr/local/bin/python |
| 0015A000 184K read/write/exec /usr/local/bin/python |
| 00188000 296K read/write/exec [ heap ] |
| FE7C0000 32K read/exec /u0/beazley/Projects/dohmodule.so |
| FE7D6000 8K read/write/exec /u0/beazley/Projects/dohmodule.so |
| ... |
| FF100000 664K read/exec /usr/lib/libc.so.1 |
| FF1B6000 24K read/write/exec /usr/lib/libc.so.1 |
| FF1BC000 8K read/write/exec /usr/lib/libc.so.1 |
| FF2C0000 120K read/exec /usr/lib/libthread.so.1 |
| FF2EE000 8K read/write/exec /usr/lib/libthread.so.1 |
| FF2F0000 48K read/write/exec /usr/lib/libthread.so.1 |
| FF310000 40K read/exec /usr/lib/libsocket.so.1 |
| FF32A000 8K read/write/exec /usr/lib/libsocket.so.1 |
| FF330000 24K read/exec /usr/lib/libpthread.so.1 |
| FF346000 8K read/write/exec /usr/lib/libpthread.so.1 |
| FF350000 8K read/write/exec [ anon ] |
| FF3B0000 8K read/exec /usr/lib/libdl.so.1 |
| FF3C0000 128K read/exec /usr/lib/ld.so.1 |
| FF3E0000 8K read/write/exec /usr/lib/ld.so.1 |
| FFBEA000 24K read/write/exec [ stack ] |
| </pre></blockquote> |
| |
| <p> |
| <li> The call stack is unwound to produce a traceback of the |
| calling sequence that led to the error. The unwinding process is just a simple |
| loop that is similar to the following: |
| |
| <blockquote><pre> |
| long *pc = get_pc(context); |
| long *sp = get_sp(context); |
| while (sp) { |
| /* Move to previous stack frame */ |
| pc = (long *) sp[15]; /* %i7 register on SPARC */ |
| sp = (long *) sp[14]; /* %i6 register on SPARC */ |
| } |
| </pre></blockquote> |
| |
| <li> For each stack frame, symbol table and debugging information |
| is gathered and stored in a WAD exception frame object. |
| Obtaining this information is the most complicated part of WAD and involves |
| the following steps: first, the current program counter is mapped to an object file |
| using the virtual memory map obtained in step 2. Next, the object file is loaded |
| using mmap(). Once loaded, the ELF symbol table |
| is searched for an address match. The symbol table contains a collection of records |
| containing memory offsets, sizes, and names such as this: |
| |
| <blockquote><pre> |
| Offset Size Name |
| -------- ------ --------- |
| 0x1280 324 wrap_foo |
| 0x1600 128 foo |
| 0x2408 192 bar |
| ... |
| </pre></blockquote> |
| |
| To find a match for a virtual memory address <em>addr</em>, WAD simply |
| searches for a symbol <em>s</em> such that <em>base</em> + |
| <em>s</em>.offset <= <em>addr</em> < <em>base</em> + |
| <em>s</em>.offset + <em>s</em>.size, where <em>base</em> is the base |
| virtual address of the object file in the virtual memory map. |
| |
| <p> |
| Debugging information, if available, is scanned to identify a source |
| file, function name, and line number. This involves scanning object files for a |
| table of debugging information stored in a format |
| known as ``stabs.''. Stabs is a relatively simple, but highly extensible format that |
| is language independent and capable of encoding almost every aspect of the |
| original source code. For the purposes of WAD, only a small subset of this |
| data is actually used. |
| |
| <p> |
| The following table shows a small fragment of relevant stabs data: |
| <blockquote><pre> |
| type desc value string description |
| ------ ----- --------- --------------------------- ----------- |
| 0x64 0 0 /u0/beazley/Projects/foo/ Pathname |
| 0x64 0 0 foo.c Filename |
| ... |
| 0x24 0 0 foo:F(0,3);(0,3) Function |
| 0xa0 4 68 n:p(0,3) Parameter |
| ... |
| 0x44 6 8 Line number |
| 0x44 7 12 Line number |
| 0x44 8 44 Line number |
| 0x44 9 56 Line number |
| ... |
| </pre></blockquote> |
| |
| In the table, the type field indicates the type of debugging information. For |
| example, 0x64 specifies the source file, 0x24 is a function |
| definition, 0xa0 is a function parameter, and 0x44 is line number |
| information. Associated with each stab is a collection of parameters |
| and an optional string. The string usually contains symbol names and |
| other information. The <tt>desc</tt> and <tt>value</tt> fields are numbers |
| that usually contain byte offsets and line number data. |
| Therefore, to collect debugging information, WAD simply walks through the debugging |
| tables until it finds the function of interest. Once found, parameter and line |
| number specifiers are inspected to determine the location and values of the function |
| arguments as well the source line at which the error occurred. |
| |
| <p> |
| <li> After the complete traceback has been obtained, it is examined to see if |
| there are any ``safe'' return points to which control can be returned. |
| This is accomplished by maintaining an internal table of predefined symbolic return |
| points as shown in the following table: |
| |
| <blockquote><pre> |
| Python symbol Return value |
| ----------------------------- ------------------ |
| call_builtin NULL |
| _PyImport_LoadDynamicModule NULL |
| PyObject_Repr NULL |
| PyObject_Print -1 |
| PyObject_CallFunction NULL |
| PyObject_CallMethod NULL |
| PyObject_CallObject NULL |
| PyObject_Cmp -1 |
| PyObject_Compare -1 |
| PyObject_DelAttrString -1 |
| PyObject_DelItem -1 |
| PyObject_GetAttrString NULL |
| PyObject_GetItem NULL |
| PyObject_HasAttrString -1 |
| PyObject_Hash -1 |
| PyObject_Length -1 |
| PyObject_SetAttrString -1 |
| PyObject_SetItem -1 |
| PyObject_Str NULL |
| PyObject_Type NULL |
| ... |
| PyEval_EvalCode NULL |
| </pre></blockquote> |
| |
| The symbols in this table correspond to functions within the Python interpreter that |
| might execute extension code and include the parts of the interpreter that invoke builtin functions |
| as well as the functions from the abstract object interface. |
| If any of these symbols appear on the call stack, |
| a handler function is invoked to raise a Python exception. |
| This handler function |
| is given a WAD-specific traceback object that contains a copy of the |
| call stack and CPU registers as well as any symbolic and debugging |
| information that was obtained. If none of the symbolic return points |
| are encountered, WAD invokes a default handler that simply prints the |
| full C stack trace and generates a core file. |
| |
| <P> |
| <li> If a return point is found, the CPU context is modified in a manner that allows the signal handler to return |
| with a suitable Python error. |
| This last step is the most tricky part of the recovery process, but the general |
| idea is that CPU context is modified in a way that makes Python think that |
| an extension function simply raised an exception and returned an error. Currently, this |
| is implemented by having the signal handler return to a small |
| handler function written in assembly language which arranges to return the |
| desired value back to the specified return point. |
| |
| <p> |
| The most complicated part of modifying the CPU context is that of restoring |
| previously saved CPU registers. By manually unwinding the call stack, the |
| WAD exception handler effectively performs the same operation as a longjmp() call in C. |
| However, unlike longjmp(), no previously saved set of CPU registers are available from which to resume |
| execution in the Python interpreter. The solution to this problem depends entirely on the |
| underlying architecture. On the SPARC, register values are saved in register windows |
| which WAD manually unwinds to restore the proper state. On the Intel, the solution is much |
| more interesting. To restore the register values, WAD must manually inspect the |
| machine instructions of each function on the call stack in order to find out where the |
| registers might have been saved. This information is then used to restore the registers from their |
| saved locations before returning to the Python interpreter. |
| |
| <p> |
| <li> Python receives the exception and produces a traceback. |
| </ol> |
| |
| <h3>6. Initialization and Loading</h3> |
| |
| In the earlier example, it was shown that WAD could be both |
| loaded as an extension module or simply attached to an existing module |
| with the linker. This latter case is implemented by |
| wrapping the WAD initialization function inside the constructor of a |
| statically allocated C++ object like this: |
| |
| <blockquote> |
| <pre> |
| class WadInit { |
| public: |
| WadInit() { |
| wad_init(); /* Call the real initialization function */ |
| } |
| }; |
| static WadInit wad_initializer; |
| </pre></blockquote> |
| |
| When the dynamic loader brings WAD into memory, it automatically |
| executes the constructors of all statically allocated C++ objects. |
| Therefore, this initialization code executes immediately after |
| loading, but before Python actually calls the module initialization |
| function. As a result, when an extension module is linked with WAD, |
| the debugging capability is enabled before any other operations occur---this |
| allows WAD to respond to fatal errors that might occur during module |
| initialization. |
| |
| The rest of the initialization process consists of the following: |
| <ul> |
| <li> The WAD signal handler is installed. |
| <li> A collection of return symbols are registered with the signal handler (see the previous section). |
| <li> Four new Python exception objects <tt>SegFault</tt>, <tt>BusError</tt>, <tt>AbortError</tt>, |
| and <tt>IllegalInstruction</tt> are added |
| to the <tt>__builtin__</tt> module. |
| </ul> |
| |
| Although the use of a C++ static constructor has the potential to |
| conflict with C++ extension code that also uses static constructors, |
| it is always possible to enable WAD prior to loading a C++ extension |
| (e.g., WAD could be loaded separately). |
| |
| <h3>7. Implementation Details</h3> |
| |
| Currently, WAD is written in ANSI C with a small amount of C++, |
| and a small amount of assembly code (to assist in the return to the interpreter). |
| The entire implementation contains approximately 2000 semicolons and most of the code |
| relates to the gathering of source code information (symbol tables, |
| debugging information, etc.). |
| |
| <p> |
| Although there are libraries such as GNU bfd that can assist with the |
| reading of object files, none of these are used in the implementation [10]. |
| First, these libraries tend to be quite large |
| and are oriented more towards stand-alone tools such as debuggers, |
| linkers, and compilers. Second, due to usual nature of the runtime |
| environment and the restrictions on memory utilization (no heap, no |
| stack), the behavior of these libraries is somewhat unclear and |
| would require further study. |
| Finally, given the small size of the prototype implementation, it didn't seem necessary to rely on a |
| large general purpose library. |
| |
| <h3>8. Discussion</h3> |
| |
| The primary focus of this work is to provide a more useful error |
| reporting mechanism to extension developers. |
| However, this does not imply that |
| WAD is appropriate as a general purpose exception |
| handling mechanism. First, let's focus |
| on the recovery mechanism: |
| |
| <ul> |
| <li> When WAD unwinds the call stack, objects allocated on the stack |
| are lost. This may interact poorly with C++ extensions since the |
| unwinding process does not invoke C++ destructors. It may be possible to fix |
| this problem, but doing so would require coordination with the C++ runtime library. |
| |
| <p> |
| <li> Similarly, if a procedure allocates objects on the heap, stack unwinding |
| may cause those objects to never be reclaimed. |
| |
| <p> |
| <li> Closely related to heap management, stack unwinding may result in |
| open files, sockets, and other system resources. Furthermore, in a multithreaded |
| environment, deadlock may occur if a procedure is holding a lock when an error occurs. |
| |
| <p> |
| <li> An application may fail by overwriting the process heap and corrupting |
| memory. Although WAD can produce internal diagnostics even when the heap has been |
| destroyed, Python may fail immediately upon return from the |
| WAD signal handler or shortly thereafter. |
| |
| <p> |
| <li> If an application destroys the call stack (via buffer overflow), WAD will |
| be unable to complete a stack trace and will be unable to return to |
| Python. |
| |
| <p> |
| <li> Memory management problems such as double-freeing of memory are particularly |
| difficult to identify. If an extension module corrupts the memory allocator |
| in some manner, this may cause Python to fail in a completely unexpected location. |
| WAD is usually able to produce a traceback in this situation, but |
| it may not correspond to the real source of the problem. |
| |
| </ul> |
| |
| In addition, there are a number of issues that pertain to WAD's interaction with the |
| Python interpreter: |
| |
| <ul> |
| <li> The recovery mechanism is entirely based on symbolic information stored |
| in the Python executable. Therefore, the return points are simply specified |
| as strings such as ``call_builtin'' as opposed to real memory addresses. |
| Because of this, WAD is compatible with essentially any version of Python (provided |
| it supports class-based exceptions). |
| |
| <P> |
| <li> WAD is unable to manage multiple return values to same procedure. |
| For example, Python's <tt>eval_code2()</tt> procedure contains a huge |
| case statement for executing byte codes. Within this procedure, certain |
| function calls return NULL to indicate an error and others return -1. Since WAD |
| is unable to determine which value to return, this particular procedure does not make a very |
| good return point for error recovery. |
| |
| <P> |
| <li> An alternative approach to the symbolic recovery scheme would be to |
| instrument Python with a collection of safe return points using setjmp()/longjmp(). |
| This approach is not used because it would require a significant number of changes to |
| the interpreter and it would introduce an unacceptable amount of performance overhead. |
| |
| <p> |
| <li> WAD is generally safe to use with Python threads. However, if a |
| compiled extension function manually releases the Python interpreter |
| lock and subsequently faults, the return behavior is unspecified. In |
| the future, it may be possible to use the interpreter lock to provide coordination |
| between the interpreter and the error recovery mechanism. |
| |
| <p> |
| <li> Compiled extension code may perform an eval operation in which Python code is executed |
| in the interpreter. This results in a situation where the complete call-stack of an |
| application crosses the boundary between Python and C several times. WAD can |
| still handle faults in this setting as long as an application is doing a reasonable amount of |
| error checking. For example, a fatal error that occurs inside an eval operation could |
| be caught by the extension code and propagated further up the call stack. |
| |
| <p> |
| <li> In certain cases, Python may be configured to handle the SIGFPE signal for floating point |
| exceptions. The default Python handling of this error is to abort and dump core. However, |
| with WAD, a complete stack traceback will be obtained when a SIGFPE occurs. |
| |
| <p> |
| <li> WAD is extremely inefficient. Due to restrictions on the heap and stack, |
| WAD relies heavily on mmap() and a variety of other file |
| operations as it handles errors. It also performs linear searches of symbol and |
| debugging tables. As a result, WAD's generation of a |
| Python exception is several orders of magnitude slower than an ordinary |
| exception. |
| </ul> |
| |
| Finally, there are a number of application specific issues to note: |
| |
| <ul> |
| <li> Aggressive compiler optimization techniques may prevent WAD from |
| accurately reporting locations within the original source code. |
| This is particularly problematic with numerical applications where |
| techniques such procedure inlining can make it impossible to obtain accurate |
| debugging information. Since these types of problems also arise in |
| full-featured debuggers, it is unlikely that they can be easily fixed in WAD (at least not |
| without a considerable amount of work). |
| |
| <p> |
| <li> If an application implements its own exception handling, |
| it may provide Python with less information than what would obtained with WAD. |
| For example, a programmer might implement a function like this: |
| |
| <blockquote><pre> |
| void *Malloc(int size) { |
| void *ptr; |
| ptr = malloc(size); |
| if (!ptr) throw("Out of memory"); |
| return ptr; |
| } |
| </pre></blockquote> |
| |
| In this case, the ``throw'' function may initiate an internal |
| exception handling mechanism that relies upon setjmp/longjmp or C++ exceptions. |
| When the error eventually makes it back to the interpreter, the user will get an ``out |
| of memory'' exception, but no additional information will be |
| provided. In contrast, if the programmer simply used an <tt>assert()</tt> statement, WAD would produce a full stack trace leading to |
| the error. |
| </ul> |
| |
| |
| Despite its various limitations, WAD is applicable to a wide range of |
| extension-related errors. Furthermore, most of the errors that are |
| likely to occur are of a more benign variety. For example, a |
| segmentation fault may simply be the result of an uninitialized |
| pointer (perhaps the user forgot to call an initialization procedure). |
| Likewise, bus errors, failed assertions, and floating point exceptions |
| rarely result in a situation where the WAD recovery mechanism would be |
| unable to produce a meaningful Python traceback. |
| |
| <h3>9. Related Work</h3> |
| |
| There is a huge body of literature concerning the implementation of |
| exception handling in various programming languages and environments. |
| A detailed discussion of this work is clearly not possible here, but |
| a general overview of various exception handling issues can be found in [11]. |
| In general, there are a few themes that seem to prevail. |
| First, |
| considerable attention has been given to exception handling mechanisms |
| in specific languages such as efficient exception handling for C++. |
| Second, a great deal of work has been given to the semantic aspects of |
| exception handling such as exception hierarchies, finalization, and |
| whether or not code is restartable after an exception has occurred. |
| Finally, a fair amount of exception work has been done in the context |
| of component frameworks and distributed systems. Most of this work |
| tends to concentrate on explicit exception handling mechanisms. Very little |
| work appears to have been done in the area of converting hardware generated errors |
| into exceptions. |
| |
| <p> |
| With respect to debuggers, quite a lot of work has been done in |
| creating advanced debugging support for specific languages and |
| integrated development environments. However, very little of this work |
| has concentrated on the problem of extensible systems and |
| compiled-interpreted language integration. For instance, debuggers |
| for Python are currently unable to cross over into C extensions whereas C |
| debuggers aren't able to easily extract useful information from the |
| internals of the Python interpreter. |
| |
| <p> |
| One system of possible interest is Rn which was developed in the |
| mid-1980s at Rice University [12]. This system, primarily |
| designed for working with large scientific applications written in |
| Fortran, provided an execution monitor that consisted of a special |
| debugging process with an embedded interpreter. When attached to |
| compiled Fortran code, this monitor could dynamically patch |
| the executable in a manner that allowed parts of the code to be executed in the |
| interpreter. This was used to provide a debugging environment in which |
| essentially any part of the compiled application could be modified at |
| run-time by simply compiling the modified code (Fortran) to an |
| interpreted form and inserting a breakpoint in the original executable |
| that transferred control to the interpreter. Although this |
| particular scheme is not directly related to the functionality |
| of WAD, it is one of the few systems in which |
| interpreted and compiled code have been tightly coupled within |
| a debugging framework. Several aspects of the interpreted/compiled |
| interface are closely related to way in which WAD operates. In addition, |
| various aspects of this work may be useful should WAD be extended with |
| new capabilities. |
| |
| <h3>10. Future Directions</h3> |
| |
| WAD is currently an experimental prototype. Although this paper has |
| described its use with Python, the core of the system is generic and |
| is easily extended to other programming environments. For example, when |
| linked to C/C++ code, WAD will automatically produce stack |
| traces for fatal errors. A module for generating Tcl exceptions has |
| also been developed. Plans are underway to provide support for other |
| extensible systems including Perl, Ruby, and Guile. |
| |
| <p> |
| Finally, a number of extensions to the WAD approach may be possible. |
| For example, even though the current implementation only returns a |
| traceback string to the Python interpreter, the WAD signal handler |
| actually generates a full traceback of the C call stack including all |
| of the CPU registers and a copy of the stack data. Therefore, with a |
| little work, it may be possible to implement a diagnostic tool that |
| allows the state of the C stack to be inspected from the Python |
| interpreter after a crash has occurred. Similarly, it may be possible |
| to integrate the capabilities of WAD with those provided by the Python |
| debugger. |
| |
| <h3>11. Conclusions and Availability</h3> |
| |
| WAD provides a simple mechanism for converting fatal errors into |
| Python exceptions that provide useful information to extension |
| writers. In doing so, it solves one of the most frustrating aspects |
| of working with compiled Python extensions--that of identifying program errors. |
| Furthermore the system requires no code modifications to Python and introduces |
| no performance overhead. |
| Although the system is |
| necessarily platform specific, the system does not involve a |
| significant amount of code. As a result, it may be relatively |
| straightforward to port to other Unix systems. |
| |
| <p> |
| As of this writing, WAD is still undergoing active development. However, |
| the software is available for experimentation and download at |
| at <tt>http://systems.cs.uchicago.edu/wad</tt>. |
| |
| <h3>References</h3> |
| |
| [1] D.M. Beazley, <em>Using SWIG to Control, Prototype, and Debug C Programs with Python</em>, |
| 4th International Python Conference, Livermore, CA. (1996). |
| |
| <p> |
| [2] P.F. Dubois, <em>Climate Data Analysis Software</em>, 8th International Python Conference, |
| Arlington, VA. (2000). |
| |
| <p> |
| [3] P.F. Dubois, <em>A Facility for Creating Python Extensions in C++</em>, 7th International Python |
| Conference, Houston, TX. (1998). |
| |
| <p> |
| [4] SIP. <tt>http://www.thekompany.com/projects/pykde/</tt>. |
| |
| <p> |
| [5] FPIG. <tt>http://cens.ioc.ee/projects/f2py2e/</tt>. |
| |
| <p> |
| [6] R. Faulkner and R. Gomes, <em>The Process File System and Process Model in UNIX System V</em>, USENIX Conference Proceedings, |
| January 1991. |
| |
| <p> |
| [7] J.R. Levine, <em>Linkers & Loaders.</em> Morgan Kaufmann Publishers, 2000. |
| |
| <p> |
| [8] Free Software Foundation, <em>The "stabs" debugging format</em>. GNU info document. |
| |
| <p> |
| [9] W. Richard Stevens, <em>UNIX Network Programming: Interprocess Communication, Volume 2</em>. PTR |
| Prentice-Hall, 1998. |
| |
| <p> |
| [10] S. Chamberlain. <em>libbfd: The Binary File Descriptor Library</em>. Cygnus Support, bfd version 3.0 edition, April 1991. |
| |
| <p> |
| [11] M.L. Scott. <em>Programming Languages Pragmatics</em>. Morgan Kaufmann Publishers, 2000. |
| |
| <p> |
| [12] A. Carle, D. Cooper, R. Hood, K. Kennedy, L. Torczon, S. Warren, <em>A Practical Environment for Scientific Programming.</em> |
| IEEE Computer, Vol 20, No. 11, (1987). p. 75-89. |
| |
| |
| </body> |
| </html> |
| |
| |
| |
| |
| |
| |
| |
| |