Fix several quadratic runtime issues in HTML push parser

Fix a few remaining cases where the HTML push parser would scan more
content during lookahead than being parsed later.

Make sure that htmlParseDocTypeDecl consumes all content up to the
final '>' in case of errors. The old comment said "We shouldn't try to
resynchronize", but ignoring invalid content is also what the HTML5
spec mandates.

Likewise, make htmlParseEndTag skip to the final '>' in invalid end
tags even if not in recovery mode. This is probably the most visible
change in practice and leads to different output for some tests but is
also more in line with HTML5.

Make sure that htmlParsePI and htmlParseComment don't abort if invalid
characters are encountered but log an error and ignore the character.

Change some other end-of-buffer checks to test for a zero byte instead
of relying on IS_CHAR.

Fix usage of IS_CHAR macro in htmlParseScript.
14 files changed
tree: 7e01987e99d946f12a6d0725d8b3a07c220d1820
  1. bakefile/
  2. doc/
  3. example/
  4. fuzz/
  5. include/
  6. macos/
  7. optim/
  8. os400/
  9. python/
  10. result/
  11. test/
  12. vms/
  13. VxWorks/
  14. win32/
  15. xstc/
  16. .gitattributes
  17. .gitignore
  18. .gitlab-ci.yml
  19. .travis.yml
  20. acinclude.m4
  21. AUTHORS
  22. autogen.sh
  23. buf.c
  24. buf.h
  25. build_glob.py
  26. c14n.c
  27. catalog.c
  28. ChangeLog
  29. check-relaxng-test-suite.py
  30. check-relaxng-test-suite2.py
  31. check-xinclude-test-suite.py
  32. check-xml-test-suite.py
  33. check-xsddata-test-suite.py
  34. chvalid.c
  35. chvalid.def
  36. CMakeLists.txt
  37. config.h.cmake.in
  38. configure.ac
  39. CONTRIBUTING
  40. Copyright
  41. dbgen.pl
  42. dbgenattr.pl
  43. debugXML.c
  44. dict.c
  45. DOCBparser.c
  46. elfgcchack.h
  47. enc.h
  48. encoding.c
  49. entities.c
  50. error.c
  51. genChRanges.py
  52. gentest.py
  53. genUnicode.py
  54. global.data
  55. globals.c
  56. hash.c
  57. HTMLparser.c
  58. HTMLtree.c
  59. INSTALL.libxml2
  60. legacy.c
  61. libxml-2.0-uninstalled.pc.in
  62. libxml-2.0.pc.in
  63. libxml.3
  64. libxml.h
  65. libxml.m4
  66. libxml.spec.in
  67. libxml2-config.cmake.cmake.in
  68. libxml2-config.cmake.in
  69. libxml2.doap
  70. libxml2.syms
  71. list.c
  72. MAINTAINERS
  73. Makefile.am
  74. Makefile.tests
  75. Makefile.win
  76. nanoftp.c
  77. nanohttp.c
  78. NEWS
  79. parser.c
  80. parserInternals.c
  81. pattern.c
  82. README
  83. README.cvs-commits
  84. README.tests
  85. README.zOS
  86. regressions.py
  87. regressions.xml
  88. relaxng.c
  89. rngparser.c
  90. runsuite.c
  91. runtest.c
  92. runxmlconf.c
  93. save.h
  94. SAX.c
  95. SAX2.c
  96. schematron.c
  97. testapi.c
  98. testAutomata.c
  99. testC14N.c
  100. testchar.c
  101. testdict.c
  102. testdso.c
  103. testHTML.c
  104. testlimits.c
  105. testModule.c
  106. testOOM.c
  107. testOOMlib.c
  108. testOOMlib.h
  109. testReader.c
  110. testrecurse.c
  111. testRegexp.c
  112. testRelax.c
  113. testSAX.c
  114. testSchemas.c
  115. testThreads.c
  116. testURI.c
  117. testXPath.c
  118. threads.c
  119. timsort.h
  120. TODO
  121. TODO_SCHEMAS
  122. tree.c
  123. trio.c
  124. trio.h
  125. triodef.h
  126. trionan.c
  127. trionan.h
  128. triop.h
  129. triostr.c
  130. triostr.h
  131. uri.c
  132. valid.c
  133. xinclude.c
  134. xlink.c
  135. xml2-config.1
  136. xml2-config.in
  137. xml2Conf.sh.in
  138. xmlcatalog.c
  139. xmlIO.c
  140. xmllint.c
  141. xmlmemory.c
  142. xmlmodule.c
  143. xmlreader.c
  144. xmlregexp.c
  145. xmlsave.c
  146. xmlschemas.c
  147. xmlschemastypes.c
  148. xmlstring.c
  149. xmlunicode.c
  150. xmlwriter.c
  151. xpath.c
  152. xpointer.c
  153. xzlib.c
  154. xzlib.h