This ports the lexer to UTF-16 and deals with the immediate fallout up
to the AST. The AST will be dealt with in upcoming commits.
The lexer will still accept UTF-8 strings as input, and will transcode
them to UTF-16 for lexing. This doesn't actually incur a new allocation,
as we were already converting the input StringView to a ByteString for
each lexer.
One immediate logical benefit here is that we do not need to know off-
hand how many UTF-8 bytes some special code points occupy. They all
happen to be a single UTF-16 code unit. So instead of advancing the
lexer by 3 positions in some cases, we can just always advance by 1.
Currently, the lexer holds a ByteString, which is always heap-allocated.
When we create a copy of the lexer for the lookahead token, that token
will outlive the lexer copy. The token holds a couple of string views
into the lexer's source string. This is fine for now, because the source
string will be kept alive by the original lexer.
But if the lexer were to hold a String or Utf16String, short strings
will be stored on the stack due to SSO. Thus the token will hold views
into released stack data. We need to keep the lookahead lexer alive to
prevent UAF on views into its source string.
For the web, we allow a wobbly UTF-16 encoding (i.e. lonely surrogates
are permitted). Only in a few exceptional cases do we strictly require
valid UTF-16. As such, our `validate(AllowLonelySurrogates::Yes)` calls
will always succeed. It's a wasted effort to ever make such a check.
This patch eliminates such invocations. The validation methods will now
only check for strict UTF-16, and are only invoked when needed.
When we build a UTF-16 string, we currently always switch to the UTF-16
storage mode inside StringBuilder. Then when it comes time to create the
string, we switch the storage to ASCII if possible (by shifting the
underlying bytes up).
Instead, let's start out with ASCII storage and then switch to UTF-16
storage once we see a non-ASCII code point. For most strings, this will
avoid allocating 2x the memory, and avoids many ASCII validation calls.
We now define GenericLexer as a template to allow using it with UTF-16
strings. To keep existing users happy, the template is defined in the
Detail namespace. Then AK::GenericLexer is an alias for a char-based
view, and AK::Utf16GenericLexer is an alias for a char16-based view.
* Remove completely unused methods.
* Deduplicate methods that were overloaded with both StringView and
char const* parameters.
A future commit will templatize GenericLexer by char type. This patch
serves to make that a tiny bit easier.
This was a misinterpretation of the spec; we should only indicate focus
if the form associated element supports keyboard input, for which
FormAssociatedTextControlElement is a much better match.
This is everything except some failing ref-tests, and
`css/css-typed-om/the-stylepropertymap/properties/*` because importing
a test for every property feels excessive.
The upcoming `:heading()` pseudo-class takes multiple comma-separated
An+Bs. Also rename this field as the `:nth-[last-]child()`
pseudo-classes are only a subset of the users.
We now clamp the values returned from calc into the allowed range (where
we know it) and censor any `NaN`s to `0` both when we resolve and when
we serialize.
Gains us 76 WPT passes.
Returning this struct will allow us to modify the underlying value of
the `CalculationResult` without requiring us to go through the process
of constructing a whole new `CalculationResult` to return.
This currently only applies to property-level calculation contexts, more
work to be done to generate accepted ranges for other calculation
contexts (e.g. within transformation functions, color functions, etc)
We were handling the special cases of NaN and Infinity in basically the
same way across both functions - we can reduce code duplication by
moving this to before we branch.
This is also required as we will be moving the logic to encode in
scientific notation above the branch in a later commit and the
`convert_floating_point_to_decimal_exponential_form` method doesn't work
with non-finite values.