LibJS: Port the JS lexer and parser to UTF-16

This ports the lexer to UTF-16 and deals with the immediate fallout up
to the AST. The AST will be dealt with in upcoming commits.

The lexer will still accept UTF-8 strings as input, and will transcode
them to UTF-16 for lexing. This doesn't actually incur a new allocation,
as we were already converting the input StringView to a ByteString for
each lexer.

One immediate logical benefit here is that we do not need to know off-
hand how many UTF-8 bytes some special code points occupy. They all
happen to be a single UTF-16 code unit. So instead of advancing the
lexer by 3 positions in some cases, we can just always advance by 1.
This commit is contained in:
Timothy Flynn 2025-08-06 07:18:45 -04:00 committed by Tim Flynn
commit 00182a2405
Notes: github-actions[bot] 2025-08-13 13:57:27 +00:00
14 changed files with 467 additions and 474 deletions

View file

@ -26,13 +26,15 @@ ByteString ParserError::to_byte_string() const
return ByteString::formatted("{} (line: {}, column: {})", message, position.value().line, position.value().column);
}
ByteString ParserError::source_location_hint(StringView source, char const spacer, char const indicator) const
ByteString ParserError::source_location_hint(Utf16View const& source, char spacer, char indicator) const
{
if (!position.has_value())
return {};
// We need to modify the source to match what the lexer considers one line - normalizing
// line terminators to \n is easier than splitting using all different LT characters.
ByteString source_string = source.replace("\r\n"sv, "\n"sv, ReplaceMode::All).replace("\r"sv, "\n"sv, ReplaceMode::All).replace(LINE_SEPARATOR_STRING, "\n"sv, ReplaceMode::All).replace(PARAGRAPH_SEPARATOR_STRING, "\n"sv, ReplaceMode::All);
auto source_string = source.replace("\r\n"sv, "\n"sv, ReplaceMode::All).replace("\r"sv, "\n"sv, ReplaceMode::All).replace(LINE_SEPARATOR, "\n"sv, ReplaceMode::All).replace(PARAGRAPH_SEPARATOR, "\n"sv, ReplaceMode::All);
StringBuilder builder;
builder.append(source_string.split_view('\n', SplitBehavior::KeepEmpty)[position.value().line - 1]);
builder.append('\n');