LibWeb: Fix numeric character reference at EOF leaking its last digit

Previously, if the NumericCharacterReferenceEnd state was reached when
current_input_character was None, then the
DONT_CONSUME_NEXT_INPUT_CHARACTER macro would restore back before the
EOF, and allow the next state (after the SWITCH_TO_RETURN_STATE) to
proceed with the last digit of the numeric character reference.

For example, with something like `&#1111`, before this commit the
output would incorrectly be `<code point with the value 1111>1` instead
of just `<code point with the value 1111>`.

Instead of putting the `if (current_input_character.has_value())` check
inside NumericCharacterReferenceEnd directly, it was instead added to
DONT_CONSUME_NEXT_INPUT_CHARACTER, because all usages of the macro
benefit from this check, even if the other existing usage sites don't
exhibit any bugs without it:

- In MarkupDeclarationOpen, if the current_input_character is EOF, then
  the previous character is always `!`, so restoring and then checking
  forward for strings like `--`, `DOCTYPE`, etc won't match and the
  BogusComment state will run one extra time (once for `!` and once
  for EOF) with no practical consequences. With the `has_value()` check,
  BogusComment will only run once with EOF.

- In AfterDOCTYPEName, ConsumeNextResult::RanOutOfCharacters can only
  occur when stopping at the insertion point, and because of how
  the code is structured, it is guaranteed that current_input_character
  is either `P` or `S`, so the `has_value()` check is irrelevant.
This commit is contained in:
Ryan Liptak 2024-12-20 06:05:37 -08:00 committed by Jelle Raaijmakers
parent 752deaf6ef
commit df87a9689c
Notes: github-actions[bot] 2025-01-06 23:44:49 +00:00
3 changed files with 15 additions and 6 deletions

View file

@ -94,9 +94,10 @@ namespace Web::HTML {
} \
} while (0)
#define DONT_CONSUME_NEXT_INPUT_CHARACTER \
do { \
restore_to(m_prev_utf8_iterator); \
#define DONT_CONSUME_NEXT_INPUT_CHARACTER \
do { \
if (current_input_character.has_value()) \
restore_to(m_prev_utf8_iterator); \
} while (0)
#define ON(code_point) \