Fix more overflow checks, off-by-ones and missing NUL terminators in xmlBuf and xmlBuffer

In broad strokes, this does the following:
- Do not include the NUL terminator byte for lengths returned
  from functions.  This lets functions be more defensive.
- Set error messages when returning early due to out-of-memory
  or buffer-too-large errors.
- Set NUL terminator consistently on buffer boundaries before
  returning.
- Add a few more integer overflow checks.

* buf.c:
(xmlBufGrowInternal):
- Do not include NUL terminator byte when returning length.
- Always set NUL terminator at the end of the new buffer length
  before returning.
- Call xmlBufMemoryError() when the buffer size would overflow.
- Account for NUL terminator byte when using XML_MAX_TEXT_LENGTH.
- Always set NUL terminator at the end of the current buffer
  after resizing the buffer.
(xmlBufAddLen):
- Return an error if the buffer does not have free space for the
  NUL terminator byte.
(xmlBufAvail):
- Do not include the NUL terminator byte in the length returned.
  (See changes to encoding.c and xmlIO.c.)
(xmlBufResize):
- Move setting of NUL terminator to common code.  More than one
  path through the function failed to set it.
(xmlBufAdd):
- Call xmlBufMemoryError() when the buffer size would overflow.

* encoding.c:
(xmlCharEncFirstLineInput):
(xmlCharEncInput):
(xmlCharEncOutput):
- No longer need to subtract one from the return value of
  xmlBufAvail() since the function does this now.

* testchar.c:
(testCharRanges):
- Pass the string length without the NUL terminator.

* tree.c:
(xmlBufferGrow):
- Do not include NUL terminator byte when returning length.
- Always set NUL terminator at the end of the new buffer length
  before returning.
- Call xmlTreeErrMemory() when the buffer size would overflow.
- Always set NUL terminator at the end of the current buffer
  after resizing the buffer.
(xmlBufferDump):
- Change type of the return variable to match fwrite().
- Clamp return value to INT_MAX to prevent overflow.
(xmlBufferResize):
- Update error message in xmlTreeErrMemory() to be consistent
  with other similar messages.
- Move setting of NUL terminator to common code.  More than one
  path through the function failed to set it.
(xmlBufferAdd):
- Call xmlTreeErrMemory() when the buffer size would overflow.
(xmlBufferAddHead):
- Set NUL terminator before returning early when shifting
  contents.
- Add overflow checks similar to those in xmlBufferAdd().

* xmlIO.c:
(xmlOutputBufferWriteEscape):
- No longer need to subtract one from the return value of
  xmlBufAvail() since the function does this now.
5 files changed
tree: 9b9b80bcdf198b4d500f594da6a017bfe5c12958
  1. doc/
  2. example/
  3. fuzz/
  4. include/
  5. m4/
  6. optim/
  7. os400/
  8. python/
  9. result/
  10. test/
  11. vms/
  12. win32/
  13. xstc/
  14. .gitattributes
  15. .gitignore
  16. .gitlab-ci.yml
  17. autogen.sh
  18. buf.c
  19. buf.h
  20. build_glob.py
  21. c14n.c
  22. catalog.c
  23. check-relaxng-test-suite.py
  24. check-relaxng-test-suite2.py
  25. check-xinclude-test-suite.py
  26. check-xml-test-suite.py
  27. check-xsddata-test-suite.py
  28. chvalid.c
  29. chvalid.def
  30. CMakeLists.txt
  31. config.h.cmake.in
  32. configure.ac
  33. Copyright
  34. dbgen.pl
  35. dbgenattr.pl
  36. debugXML.c
  37. dict.c
  38. enc.h
  39. encoding.c
  40. entities.c
  41. error.c
  42. genChRanges.py
  43. gentest.py
  44. genUnicode.py
  45. global.data
  46. globals.c
  47. hash.c
  48. HTMLparser.c
  49. HTMLtree.c
  50. legacy.c
  51. libxml-2.0-uninstalled.pc.in
  52. libxml-2.0.pc.in
  53. libxml.h
  54. libxml.m4
  55. libxml.spec.in
  56. libxml2-config.cmake.cmake.in
  57. libxml2-config.cmake.in
  58. libxml2.doap
  59. libxml2.syms
  60. list.c
  61. Makefile.am
  62. Makefile.tests
  63. nanoftp.c
  64. nanohttp.c
  65. NEWS
  66. parser.c
  67. parserInternals.c
  68. pattern.c
  69. README.md
  70. README.tests
  71. README.zOS
  72. relaxng.c
  73. rngparser.c
  74. runsuite.c
  75. runtest.c
  76. runxmlconf.c
  77. save.h
  78. SAX.c
  79. SAX2.c
  80. schematron.c
  81. testapi.c
  82. testAutomata.c
  83. testchar.c
  84. testdict.c
  85. testdso.c
  86. testlimits.c
  87. testModule.c
  88. testOOM.c
  89. testOOMlib.c
  90. testOOMlib.h
  91. testrecurse.c
  92. testThreads.c
  93. threads.c
  94. timsort.h
  95. TODO
  96. TODO_SCHEMAS
  97. tree.c
  98. trio.c
  99. trio.h
  100. triodef.h
  101. trionan.c
  102. trionan.h
  103. triop.h
  104. triostr.c
  105. triostr.h
  106. uri.c
  107. valid.c
  108. xinclude.c
  109. xlink.c
  110. xml2-config.in
  111. xmlcatalog.c
  112. xmlIO.c
  113. xmllint.c
  114. xmlmemory.c
  115. xmlmodule.c
  116. xmlreader.c
  117. xmlregexp.c
  118. xmlsave.c
  119. xmlschemas.c
  120. xmlschemastypes.c
  121. xmlstring.c
  122. xmlunicode.c
  123. xmlwriter.c
  124. xpath.c
  125. xpointer.c
  126. xzlib.c
  127. xzlib.h
README.md

libxml2

libxml2 is an XML toolkit implemented in C, originally developed for the GNOME Project.

Full documentation is available at https://gitlab.gnome.org/GNOME/libxml2/-/wikis.

Bugs should be reported at https://gitlab.gnome.org/GNOME/libxml2/-/issues.

A mailing list xml@gnome.org is available. You can subscribe at https://mail.gnome.org/mailman/listinfo/xml. The list archive is at https://mail.gnome.org/archives/xml/.

License

This code is released under the MIT License, see the Copyright file.

Build instructions

libxml2 can be built with GNU Autotools, CMake, or several other build systems in platform-specific subdirectories.

Autotools (for POSIX systems like Linux, BSD, macOS)

If you build from a Git tree, you have to install Autotools and start by generating the configuration files with:

./autogen.sh

If you build from a source tarball, extract the archive with:

tar xf libxml2-xxx.tar.gz
cd libxml2-xxx

To see a list of build options:

./configure --help

Also see the INSTALL file for additional instructions. Then you can configure and build the library:

./configure [possible options]
make

Note that by default, no optimization options are used. You have to enable them manually, for example with:

CFLAGS='-O2 -fno-semantic-interposition' ./configure

Now you can run the test suite with:

make check

Please report test failures to the mailing list or bug tracker.

Then you can install the library:

make install

At that point you may have to rerun ldconfig or a similar utility to update your list of installed shared libs.

CMake (mainly for Windows)

Another option for compiling libxml is using CMake:

cmake -E tar xf libxml2-xxx.tar.gz
cmake -S libxml2-xxx -B libxml2-xxx-build [possible options]
cmake --build libxml2-xxx-build
cmake --install libxml2-xxx-build

Common CMake options include:

-D BUILD_SHARED_LIBS=OFF            # build static libraries
-D CMAKE_BUILD_TYPE=Release         # specify build type
-D CMAKE_INSTALL_PREFIX=/usr/local  # specify the install path
-D LIBXML2_WITH_ICONV=OFF           # disable iconv
-D LIBXML2_WITH_LZMA=OFF            # disable liblzma
-D LIBXML2_WITH_PYTHON=OFF          # disable Python
-D LIBXML2_WITH_ZLIB=OFF            # disable libz

You can also open the libxml source directory with its CMakeLists.txt directly in various IDEs such as CLion, QtCreator, or Visual Studio.

Dependencies

Libxml does not require any other libraries. A platform with somewhat recent POSIX support should be sufficient (please report any violation to this rule you may find).

However, if found at configuration time, libxml will detect and use the following libraries:

  • libz, a highly portable and widely available compression library.
  • liblzma, another compression library.
  • libiconv, a character encoding conversion library. The iconv function is part of POSIX.1-2001, so libiconv isn't required on modern UNIX-like systems like Linux, BSD or macOS.
  • ICU, a Unicode library. Mainly useful as an alternative to iconv on Windows. Unnecessary on most other systems.

Contributing

The current version of the code can be found in GNOME's GitLab at at https://gitlab.gnome.org/GNOME/libxml2. The best way to get involved is by creating issues and merge requests on GitLab. Alternatively, you can start discussions and send patches to the mailing list. If you want to work with patches, please format them with git-format-patch and use plain text attachments.

All code must conform to C89 and pass the GitLab CI tests. Add regression tests if possible.

Authors

  • Daniel Veillard
  • Bjorn Reese
  • William Brack
  • Igor Zlatkovic for the Windows port
  • Aleksey Sanin
  • Nick Wellnhofer