| <?xml version="1.0" encoding="ISO-8859-1"?> |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css"> |
| TD {font-family: Verdana,Arial,Helvetica} |
| BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em} |
| H1 {font-family: Verdana,Arial,Helvetica} |
| H2 {font-family: Verdana,Arial,Helvetica} |
| H3 {font-family: Verdana,Arial,Helvetica} |
| A:link, A:visited, A:active { text-decoration: underline } |
| </style><title>Catalog support</title></head><body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000"><table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr><td width="120"><a href="http://swpat.ffii.org/"><img src="epatents.png" alt="Action against software patents" /></a></td><td width="180"><a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo" /></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo" /></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo" /></a><div align="left"><a href="http://xmlsoft.org/"><img src="Libxml2-Logo-180x168.gif" alt="Made with Libxml2 Logo" /></a></div></td><td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"><h1>The XML C parser and toolkit of Gnome</h1><h2>Catalog support</h2></td></tr></table></td></tr></table></td></tr></table><table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr><td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr><tr><td bgcolor="#fffacd"><form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form><ul><li><a href="index.html">Home</a></li><li><a href="intro.html">Introduction</a></li><li><a href="FAQ.html">FAQ</a></li><li><a href="docs.html" style="font-weight:bold">Developer Menu</a></li><li><a href="bugs.html">Reporting bugs and getting help</a></li><li><a href="help.html">How to help</a></li><li><a href="downloads.html">Downloads</a></li><li><a href="news.html">News</a></li><li><a href="XMLinfo.html">XML</a></li><li><a href="XSLT.html">XSLT</a></li><li><a href="xmldtd.html">Validation & DTDs</a></li><li><a href="encoding.html">Encodings support</a></li><li><a href="catalog.html">Catalog support</a></li><li><a href="namespaces.html">Namespaces</a></li><li><a href="contribs.html">Contributions</a></li><li><a href="examples/index.html" style="font-weight:bold">Code Examples</a></li><li><a href="html/index.html" style="font-weight:bold">API Menu</a></li><li><a href="guidelines.html">XML Guidelines</a></li></ul></td></tr></table><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr><tr><td bgcolor="#fffacd"><ul><li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li><li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li><li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li><li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li><li><a href="ftp://xmlsoft.org/">FTP</a></li><li><a href="http://www.zlatkovic.com/projects/libxml/">Windows binaries</a></li><li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li><li><a href="http://www.zveno.com/open_source/libxml2xslt.html">MacOsX binaries</a></li><li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li><li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml2">Bug Tracker</a></li></ul></td></tr></table></td></tr></table></td><td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"><p>Table of Content:</p><ol><li><a href="General2">General overview</a></li> |
| <li><a href="#definition">The definition</a></li> |
| <li><a href="#Simple">Using catalogs</a></li> |
| <li><a href="#Some">Some examples</a></li> |
| <li><a href="#reference">How to tune catalog usage</a></li> |
| <li><a href="#validate">How to debug catalog processing</a></li> |
| <li><a href="#Declaring">How to create and maintain catalogs</a></li> |
| <li><a href="#implemento">The implementor corner quick review of the |
| API</a></li> |
| <li><a href="#Other">Other resources</a></li> |
| </ol><h3><a name="General2" id="General2">General overview</a></h3><p>What is a catalog? Basically it's a lookup mechanism used when an entity |
| (a file or a remote resource) references another entity. The catalog lookup |
| is inserted between the moment the reference is recognized by the software |
| (XML parser, stylesheet processing, or even images referenced for inclusion |
| in a rendering) and the time where loading that resource is actually |
| started.</p><p>It is basically used for 3 things:</p><ul><li>mapping from "logical" names, the public identifiers and a more |
| concrete name usable for download (and URI). For example it can associate |
| the logical name |
| <p>"-//OASIS//DTD DocBook XML V4.1.2//EN"</p> |
| <p>of the DocBook 4.1.2 XML DTD with the actual URL where it can be |
| downloaded</p> |
| <p>http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd</p> |
| </li> |
| <li>remapping from a given URL to another one, like an HTTP indirection |
| saying that |
| <p>"http://www.oasis-open.org/committes/tr.xsl"</p> |
| <p>should really be looked at</p> |
| <p>"http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"</p> |
| </li> |
| <li>providing a local cache mechanism allowing to load the entities |
| associated to public identifiers or remote resources, this is a really |
| important feature for any significant deployment of XML or SGML since it |
| allows to avoid the aleas and delays associated to fetching remote |
| resources.</li> |
| </ul><h3><a name="definition" id="definition">The definitions</a></h3><p>Libxml, as of 2.4.3 implements 2 kind of catalogs:</p><ul><li>the older SGML catalogs, the official spec is SGML Open Technical |
| Resolution TR9401:1997, but is better understood by reading <a href="http://www.jclark.com/sp/catalog.htm">the SP Catalog page</a> from |
| James Clark. This is relatively old and not the preferred mode of |
| operation of libxml.</li> |
| <li><a href="http://www.oasis-open.org/committees/entity/spec.html">XML |
| Catalogs</a> is far more flexible, more recent, uses an XML syntax and |
| should scale quite better. This is the default option of libxml.</li> |
| </ul><p></p><h3><a name="Simple" id="Simple">Using catalog</a></h3><p>In a normal environment libxml2 will by default check the presence of a |
| catalog in /etc/xml/catalog, and assuming it has been correctly populated, |
| the processing is completely transparent to the document user. To take a |
| concrete example, suppose you are authoring a DocBook document, this one |
| starts with the following DOCTYPE definition:</p><pre><?xml version='1.0'?> |
| <!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN" |
| "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd"></pre><p>When validating the document with libxml, the catalog will be |
| automatically consulted to lookup the public identifier "-//Norman Walsh//DTD |
| DocBk XML V3.1.4//EN" and the system identifier |
| "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have |
| been installed on your system and the catalogs actually point to them, libxml |
| will fetch them from the local disk.</p><p style="font-size: 10pt"><strong>Note</strong>: Really don't use this |
| DOCTYPE example it's a really old version, but is fine as an example.</p><p>Libxml2 will check the catalog each time that it is requested to load an |
| entity, this includes DTD, external parsed entities, stylesheets, etc ... If |
| your system is correctly configured all the authoring phase and processing |
| should use only local files, even if your document stays portable because it |
| uses the canonical public and system ID, referencing the remote document.</p><h3><a name="Some" id="Some">Some examples:</a></h3><p>Here is a couple of fragments from XML Catalogs used in libxml2 early |
| regression tests in <code>test/catalogs</code> :</p><pre><?xml version="1.0"?> |
| <!DOCTYPE catalog PUBLIC |
| "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" |
| "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> |
| <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> |
| <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN" |
| uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/> |
| ...</pre><p>This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are |
| written in XML, there is a specific namespace for catalog elements |
| "urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this |
| catalog is a <code>public</code> mapping it allows to associate a Public |
| Identifier with an URI.</p><pre>... |
| <rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/" |
| rewritePrefix="file:///usr/share/xml/docbook/"/> |
| ...</pre><p>A <code>rewriteSystem</code> is a very powerful instruction, it says that |
| any URI starting with a given prefix should be looked at another URI |
| constructed by replacing the prefix with an new one. In effect this acts like |
| a cache system for a full area of the Web. In practice it is extremely useful |
| with a file prefix if you have installed a copy of those resources on your |
| local system.</p><pre>... |
| <delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //" |
| catalog="file:///usr/share/xml/docbook.xml"/> |
| <delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML" |
| catalog="file:///usr/share/xml/docbook.xml"/> |
| <delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML" |
| catalog="file:///usr/share/xml/docbook.xml"/> |
| <delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/" |
| catalog="file:///usr/share/xml/docbook.xml"/> |
| <delegateURI uriStartString="http://www.oasis-open.org/docbook/" |
| catalog="file:///usr/share/xml/docbook.xml"/> |
| ...</pre><p>Delegation is the core features which allows to build a tree of catalogs, |
| easier to maintain than a single catalog, based on Public Identifier, System |
| Identifier or URI prefixes it instructs the catalog software to look up |
| entries in another resource. This feature allow to build hierarchies of |
| catalogs, the set of entries presented should be sufficient to redirect the |
| resolution of all DocBook references to the specific catalog in |
| <code>/usr/share/xml/docbook.xml</code> this one in turn could delegate all |
| references for DocBook 4.2.1 to a specific catalog installed at the same time |
| as the DocBook resources on the local machine.</p><h3><a name="reference" id="reference">How to tune catalog usage:</a></h3><p>The user can change the default catalog behaviour by redirecting queries |
| to its own set of catalogs, this can be done by setting the |
| <code>XML_CATALOG_FILES</code> environment variable to a list of catalogs, an |
| empty one should deactivate loading the default <code>/etc/xml/catalog</code> |
| default catalog</p><h3><a name="validate" id="validate">How to debug catalog processing:</a></h3><p>Setting up the <code>XML_DEBUG_CATALOG</code> environment variable will |
| make libxml2 output debugging informations for each catalog operations, for |
| example:</p><pre>orchis:~/XML -> xmllint --memory --noout test/ent2 |
| warning: failed to load external entity "title.xml" |
| orchis:~/XML -> export XML_DEBUG_CATALOG= |
| orchis:~/XML -> xmllint --memory --noout test/ent2 |
| Failed to parse catalog /etc/xml/catalog |
| Failed to parse catalog /etc/xml/catalog |
| warning: failed to load external entity "title.xml" |
| Catalogs cleanup |
| orchis:~/XML -> </pre><p>The test/ent2 references an entity, running the parser from memory makes |
| the base URI unavailable and the the "title.xml" entity cannot be loaded. |
| Setting up the debug environment variable allows to detect that an attempt is |
| made to load the <code>/etc/xml/catalog</code> but since it's not present the |
| resolution fails.</p><p>But the most advanced way to debug XML catalog processing is to use the |
| <strong>xmlcatalog</strong> command shipped with libxml2, it allows to load |
| catalogs and make resolution queries to see what is going on. This is also |
| used for the regression tests:</p><pre>orchis:~/XML -> ./xmlcatalog test/catalogs/docbook.xml \ |
| "-//OASIS//DTD DocBook XML V4.1.2//EN" |
| http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd |
| orchis:~/XML -> </pre><p>For debugging what is going on, adding one -v flags increase the verbosity |
| level to indicate the processing done (adding a second flag also indicate |
| what elements are recognized at parsing):</p><pre>orchis:~/XML -> ./xmlcatalog -v test/catalogs/docbook.xml \ |
| "-//OASIS//DTD DocBook XML V4.1.2//EN" |
| Parsing catalog test/catalogs/docbook.xml's content |
| Found public match -//OASIS//DTD DocBook XML V4.1.2//EN |
| http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd |
| Catalogs cleanup |
| orchis:~/XML -> </pre><p>A shell interface is also available to debug and process multiple queries |
| (and for regression tests):</p><pre>orchis:~/XML -> ./xmlcatalog -shell test/catalogs/docbook.xml \ |
| "-//OASIS//DTD DocBook XML V4.1.2//EN" |
| > help |
| Commands available: |
| public PublicID: make a PUBLIC identifier lookup |
| system SystemID: make a SYSTEM identifier lookup |
| resolve PublicID SystemID: do a full resolver lookup |
| add 'type' 'orig' 'replace' : add an entry |
| del 'values' : remove values |
| dump: print the current catalog state |
| debug: increase the verbosity level |
| quiet: decrease the verbosity level |
| exit: quit the shell |
| > public "-//OASIS//DTD DocBook XML V4.1.2//EN" |
| http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd |
| > quit |
| orchis:~/XML -> </pre><p>This should be sufficient for most debugging purpose, this was actually |
| used heavily to debug the XML Catalog implementation itself.</p><h3><a name="Declaring" id="Declaring">How to create and maintain</a> catalogs:</h3><p>Basically XML Catalogs are XML files, you can either use XML tools to |
| manage them or use <strong>xmlcatalog</strong> for this. The basic step is |
| to create a catalog the -create option provide this facility:</p><pre>orchis:~/XML -> ./xmlcatalog --create tst.xml |
| <?xml version="1.0"?> |
| <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" |
| "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> |
| <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/> |
| orchis:~/XML -> </pre><p>By default xmlcatalog does not overwrite the original catalog and save the |
| result on the standard output, this can be overridden using the -noout |
| option. The <code>-add</code> command allows to add entries in the |
| catalog:</p><pre>orchis:~/XML -> ./xmlcatalog --noout --create --add "public" \ |
| "-//OASIS//DTD DocBook XML V4.1.2//EN" \ |
| http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml |
| orchis:~/XML -> cat tst.xml |
| <?xml version="1.0"?> |
| <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" \ |
| "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> |
| <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> |
| <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN" |
| uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/> |
| </catalog> |
| orchis:~/XML -> </pre><p>The <code>-add</code> option will always take 3 parameters even if some of |
| the XML Catalog constructs (like nextCatalog) will have only a single |
| argument, just pass a third empty string, it will be ignored.</p><p>Similarly the <code>-del</code> option remove matching entries from the |
| catalog:</p><pre>orchis:~/XML -> ./xmlcatalog --del \ |
| "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml |
| <?xml version="1.0"?> |
| <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" |
| "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> |
| <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/> |
| orchis:~/XML -> </pre><p>The catalog is now empty. Note that the matching of <code>-del</code> is |
| exact and would have worked in a similar fashion with the Public ID |
| string.</p><p>This is rudimentary but should be sufficient to manage a not too complex |
| catalog tree of resources.</p><h3><a name="implemento" id="implemento">The implementor corner quick review of the |
| API:</a></h3><p>First, and like for every other module of libxml, there is an |
| automatically generated <a href="html/libxml-catalog.html">API page for |
| catalog support</a>.</p><p>The header for the catalog interfaces should be included as:</p><pre>#include <libxml/catalog.h></pre><p>The API is voluntarily kept very simple. First it is not obvious that |
| applications really need access to it since it is the default behaviour of |
| libxml2 (Note: it is possible to completely override libxml2 default catalog |
| by using <a href="html/libxml-parser.html">xmlSetExternalEntityLoader</a> to |
| plug an application specific resolver).</p><p>Basically libxml2 support 2 catalog lists:</p><ul><li>the default one, global shared by all the application</li> |
| <li>a per-document catalog, this one is built if the document uses the |
| <code>oasis-xml-catalog</code> PIs to specify its own catalog list, it is |
| associated to the parser context and destroyed when the parsing context |
| is destroyed.</li> |
| </ul><p>the document one will be used first if it exists.</p><h4>Initialization routines:</h4><p>xmlInitializeCatalog(), xmlLoadCatalog() and xmlLoadCatalogs() should be |
| used at startup to initialize the catalog, if the catalog should be |
| initialized with specific values xmlLoadCatalog() or xmlLoadCatalogs() |
| should be called before xmlInitializeCatalog() which would otherwise do a |
| default initialization first.</p><p>The xmlCatalogAddLocal() call is used by the parser to grow the document |
| own catalog list if needed.</p><h4>Preferences setup:</h4><p>The XML Catalog spec requires the possibility to select default |
| preferences between public and system delegation, |
| xmlCatalogSetDefaultPrefer() allows this, xmlCatalogSetDefaults() and |
| xmlCatalogGetDefaults() allow to control if XML Catalogs resolution should |
| be forbidden, allowed for global catalog, for document catalog or both, the |
| default is to allow both.</p><p>And of course xmlCatalogSetDebug() allows to generate debug messages |
| (through the xmlGenericError() mechanism).</p><h4>Querying routines:</h4><p>xmlCatalogResolve(), xmlCatalogResolveSystem(), xmlCatalogResolvePublic() |
| and xmlCatalogResolveURI() are relatively explicit if you read the XML |
| Catalog specification they correspond to section 7 algorithms, they should |
| also work if you have loaded an SGML catalog with a simplified semantic.</p><p>xmlCatalogLocalResolve() and xmlCatalogLocalResolveURI() are the same but |
| operate on the document catalog list</p><h4>Cleanup and Miscellaneous:</h4><p>xmlCatalogCleanup() free-up the global catalog, xmlCatalogFreeLocal() is |
| the per-document equivalent.</p><p>xmlCatalogAdd() and xmlCatalogRemove() are used to dynamically modify the |
| first catalog in the global list, and xmlCatalogDump() allows to dump a |
| catalog state, those routines are primarily designed for xmlcatalog, I'm not |
| sure that exposing more complex interfaces (like navigation ones) would be |
| really useful.</p><p>The xmlParseCatalogFile() is a function used to load XML Catalog files, |
| it's similar as xmlParseFile() except it bypass all catalog lookups, it's |
| provided because this functionality may be useful for client tools.</p><h4>threaded environments:</h4><p>Since the catalog tree is built progressively, some care has been taken to |
| try to avoid troubles in multithreaded environments. The code is now thread |
| safe assuming that the libxml2 library has been compiled with threads |
| support.</p><p></p><h3><a name="Other" id="Other">Other resources</a></h3><p>The XML Catalog specification is relatively recent so there isn't much |
| literature to point at:</p><ul><li>You can find a good rant from Norm Walsh about <a href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the |
| need for catalogs</a>, it provides a lot of context informations even if |
| I don't agree with everything presented. Norm also wrote a more recent |
| article <a href="http://wwws.sun.com/software/xml/developers/resolver/article/">XML |
| entities and URI resolvers</a> describing them.</li> |
| <li>An <a href="http://home.ccil.org/~cowan/XML/XCatalog.html">old XML |
| catalog proposal</a> from John Cowan</li> |
| <li>The <a href="http://www.rddl.org/">Resource Directory Description |
| Language</a> (RDDL) another catalog system but more oriented toward |
| providing metadata for XML namespaces.</li> |
| <li>the page from the OASIS Technical <a href="http://www.oasis-open.org/committees/entity/">Committee on Entity |
| Resolution</a> who maintains XML Catalog, you will find pointers to the |
| specification update, some background and pointers to others tools |
| providing XML Catalog support</li> |
| <li>There is a <a href="buildDocBookCatalog">shell script</a> to generate |
| XML Catalogs for DocBook 4.1.2 . If it can write to the /etc/xml/ |
| directory, it will set-up /etc/xml/catalog and /etc/xml/docbook based on |
| the resources found on the system. Otherwise it will just create |
| ~/xmlcatalog and ~/dbkxmlcatalog and doing: |
| <p><code>export XML_CATALOG_FILES=$HOME/xmlcatalog</code></p> |
| <p>should allow to process DocBook documentations without requiring |
| network accesses for the DTD or stylesheets</p> |
| </li> |
| <li>I have uploaded <a href="ftp://xmlsoft.org/test/dbk412catalog.tar.gz">a |
| small tarball</a> containing XML Catalogs for DocBook 4.1.2 which seems |
| to work fine for me too</li> |
| <li>The <a href="http://www.xmlsoft.org/xmlcatalog_man.html">xmlcatalog |
| manual page</a></li> |
| </ul><p>If you have suggestions for corrections or additions, simply contact |
| me:</p><p><a href="bugs.html">Daniel Veillard</a></p></td></tr></table></td></tr></table></td></tr></table></td></tr></table></td></tr></table></body></html> |