blob: b261c0f40e4f79b2df43bd8f2ef06026cbfe80c9 [file] [log] [blame]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>Upgrading libxml client code from 1.x to 2.x</title>
<meta name="GENERATOR" content="amaya V3.1">
<meta http-equiv="Content-Type" content="text/html">
</head>
<body bgcolor="#ffffff">
<h1 align="center">Upgrading libxml client code from 1.x to 2.x</h1>
<h2>Incompatible changes:</h2>
<p>Version 2 of libxml is the first version introducing serious backward
incompatible changes. The main goals were:</p>
<ul>
<li>a general cleanup. A number of mistakes inherited from the very early
versions couldn't be changed due to compatibility constraints. Example the
"childs" element in the nodes.</li>
<li>Uniformization of the various nodes, at least for their header and link
parts (doc, parent, children, prev, next), the goal is a simpler
programming model and simplifying the task of the DOM implementors.</li>
<li>better conformances to the XML specification, for example version 1.x
had an heuristic to try to detect ignorable white spaces. As a result the
SAX event generated were ignorableWhitespace() while the spec requires
character() in that case. This also mean that a number of DOM node
containing blank text may populate the DOM tree which were not present
before.</li>
</ul>
<h2>How to fix libxml-1.x code:</h2>
<p>So client code of libxml designed to run with version 1.x may have to be
changed to compile against version 2.x of libxml. Here is a list of changes
that I have collected, they may not be sufficient, so in case you find other
change which are required, <a href="mailto:Daniel.Ïeillardw3.org">drop me a
mail</a>:</p>
<ol>
<li>Node <strong>childs</strong> field has been renamed
<strong>children</strong> so s/childs/children/g should be applied
(probablility of having "childs" anywere else is close to 0+</li>
<li>The document don't have anymore a <strong>root</strong> element it has
been replaced by <strong>children</strong> and usually you will get a list
of element here. For example a Dtd element for the internal subset and
it's declaration may be found in that list, as well as processing
instructions or comments found before or after the document root element.
Use <strong>xmlDocGetRootElement(doc)</strong> to get the root element of
a document. Alternatively if you are sure to not reference Dtds nor have
PIs or comments before or after the root element s/->root/->children/g
will probably do it.
<p><strong>Note</strong>: libxml2 final version now export a version
number as the LIBXML_VERSION preprocessor token. In most case the changes
required for 1/ and 2/ can be dealt with using the following construct (if
you don't use root identifier for other purposes):</p>
<pre>#if defined(LIBXML_VERSION) &amp;&amp; LIBXML_VERSION >= 20000
#define root children
#define childs children
#endif
</pre>
</li>
<li>The white space issue, this one is more complex, unless special case of
validating parsing, the line breaks and spaces usually used for indenting
and formatting the document content becomes significant. So they are
reported by SAX and if your using the DOM tree, corresponding nodes are
generated. Too approach can be taken:
<ol>
<li>lazy one, use the compatibility call
<strong>xmlKeepBlanksDefault(0)</strong> but be aware that you are
relying on a special (and possibly broken) set of heuristics of libxml
to detect ignorable blanks. Don't complain if it breaks or make your
application not 100% clean w.r.t. to it's input.</li>
<li>the Right Way: change you code to accept possibly unsignificant
blanks characters, or have your tree populated with weird blank text
nodes. You can spot them using the comodity function
<strong>xmlIsBlankNode(node)</strong> returning 1 for such blank
nodes.</li>
</ol>
<p>Note also that with the new default the output functions don't add any
extra indentation when saving a tree in order to be able to round trip
(read and save) without inflating the document with extra formatting
chars.</p>
</li>
<li>The include path has changed to $prefix/libxml/ and the includes
themselves uses this new prefix in includes instructions... If you are
using (as expected) the
<pre>xml-config --cflags</pre>
<p>output to generate you compile commands this will probably work out of
the box</p>
</li>
</ol>
<h2>Keeping both libxml-1.x and libxml-2.x compatibility:</h2>
<p>Here is the steps i applied successfully to a couple of gnome project
dependant on libxml to allow compilation under both environments:</p>
<ol>
<li>make sure your configure adds the output of "xml-config --cflags" to
the compiler command line</li>
<li>in your C files including libxml includes do the following
<pre>#include &lt;xmlmemory.h>
#if defined(LIBXML_VERSION) &amp;&amp; LIBXML_VERSION >= 20000
#include &lt;libxml/parser.h>
#include &lt;libxml/tree.h>
#define root children
#define childs children
#else
#include &lt;gnome-xml/parser.h>
#include &lt;gnome-xml/tree.h>
#endif</pre>
<p>the first include name is really specific to libxml and won't clash
with other installed softare includes. Once included we can tell
the version used and use prefixed path for the includes to safely
include headers like tree.h .</p>
<p>Second the two #defines allows to handle changes dones in the names of
public structures. Just make sure that you don't use the "root" name for
other structure in your module. Using xmlDocGetRootElement(doc) is the
proper way to access the root node now but is not available on old libxml
version (but present in 1.8.7).</p>
</li>
<li>libxml-2 generates "empty" text nodes for "formatting spaces" found in
the XML input. The proper way to handle this change is to check them (and
ignore them) when scanning an XML tree produced after libxml parsing. The
quick and dirty solution is to force libxml to the old behaviour of
ignoring those formatting spaces by adding the following code before any
call to the XML parser:
<pre>#if defined(LIBXML_VERSION) &amp;&amp; LIBXML_VERSION >= 20000
xmlKeepBlanksDefault(0);
#endif</pre>
</li>
</ol>
<p>Following those 3 steps should work. It worked for some of my own code and
for the gnome-print module. Other modules (including bonobo/gconf/nautilus)
will have to be patched in the same way. </p>
<p>Let me put some emphasis on the fact that there is far more changes from
libxml 1.x to 2.x than the ones you may have to patch for. The overall code
has been considerably improved and the conformance to the XML specification
has been drastically improve. Don't take those changes as an excuse to not
upgrade, it may cost a lot on the long term ...</p>
<p><a href="mailto:Daniel.Veillard@w3.org">Daniel Veillard</a></p>
<p>$Id: upgrade.html,v 1.4 2000/04/12 13:27:38 veillard Exp $</p>
</body>
</html>