e050062ca9f921c6da8e62ff7a42925ba8f08268 - third_party/github.com/GNOME/libxml2

commit	e050062ca9f921c6da8e62ff7a42925ba8f08268	[log] [tgz]
author	Nick Wellnhofer <wellnhofer@aevum.de>	Wed Jul 15 14:38:55 2020 +0200
committer	Nick Wellnhofer <wellnhofer@aevum.de>	Wed Jul 15 16:10:13 2020 +0200
tree	f746054947ab6af7365eda14712099d5845f7022
parent	dfd4e330489c383c0ae58d5fb1393558d6567bc6 [diff]

Make htmlCurrentChar always translate U+0000 The general assumption is that htmlCurrentChar only returns 0 if the end of the input buffer is reached. The UTF-8 path already logged an error if a zero byte U+0000 was found and returned a space character instead. Make the ASCII code path do the same. htmlParseTryOrFinish skips zero bytes at the beginning of a buffer, so even if 0 was returned from htmlCurrentChar, the push parser would make progress. But rescanning the input could cause performance problems. The pull parser would abort parsing and now handles zero bytes in ASCII mode the same way as the push parser or as in UTF-8 mode. It would be better to return the replacement character U+FFFD instead, but some of the client code assumes that the UTF-8 length of input and output matches.