This ports the lexer to UTF-16 and deals with the immediate fallout up
to the AST. The AST will be dealt with in upcoming commits.
The lexer will still accept UTF-8 strings as input, and will transcode
them to UTF-16 for lexing. This doesn't actually incur a new allocation,
as we were already converting the input StringView to a ByteString for
each lexer.
One immediate logical benefit here is that we do not need to know off-
hand how many UTF-8 bytes some special code points occupy. They all
happen to be a single UTF-16 code unit. So instead of advancing the
lexer by 3 positions in some cases, we can just always advance by 1.
Currently, the lexer holds a ByteString, which is always heap-allocated.
When we create a copy of the lexer for the lookahead token, that token
will outlive the lexer copy. The token holds a couple of string views
into the lexer's source string. This is fine for now, because the source
string will be kept alive by the original lexer.
But if the lexer were to hold a String or Utf16String, short strings
will be stored on the stack due to SSO. Thus the token will hold views
into released stack data. We need to keep the lookahead lexer alive to
prevent UAF on views into its source string.
This reverts commit c14173f651. We
should only annotate the minimum number of symbols that external
consumers actually use, so I am starting from scratch to do that
Instead, we can just use the scope type to determine if a scope is a
function scope.
This fixes using `this` for parameter default values in arrow functions
crashing. This happened by `uses_this_from_environment` was not set in
`set_uses_this`, as it didn't think it was in a function scope whilst
parsing parameters.
Fixes closing modal dialogs causing a crash on https://www.ikea.com/
No test262 diff.
Reverts the functional part of 08cfd5f, because it was a workaround for
this issue.
In the following example:
```js
const f = (i) => ({
obj: { a: { x: i }, b: { x: i } },
g: () => {},
});
```
The body of function `f` is initially parsed as an arrow function. As a
result, what is actually an object expression is interpreted as a formal
parameter with a binding pattern. Since duplicate identifiers are not
allowed in this context (`i` in the example), the parser generates an
error, causing the entire script to fail parsing.
This change ignores the "Duplicate parameter names in bindings" error
during arrow function parameter parsing, allowing the parser to continue
and recognize the object expression of the outer arrow function with an
implicit return.
Fixes error on https://chat.openai.com/
`var` bindings are never in the temporal dead zone (TDZ), and so we
know accessing them will not throw.
We now take advantage of this by having a specialized environment
binding value getter that doesn't check for exceptional cases.
1.08x speedup on JetStream.
This allows us to get rid of instructions that move arguments to locals
and allocate smaller JS::Value vector in ExecutionContext by reusing
slots that were already allocated for arguments.
With this change for following function:
```js
function f(x, y) {
return x + y;
}
```
we now produce following bytecode:
```
[ 0] 0: Add dst:reg6, lhs:arg0, rhs:arg1
[ 10] Return value:reg6
```
instead of:
```
[ 0] 0: GetArgument 0, dst:x~1
[ 10] GetArgument 1, dst:y~0
[ 20] Add dst:reg6, lhs:x~1, rhs:y~0
[ 30] Return value:reg6
```
Previously we blocked using locals for function arguments whenever
`arguments` was mentioned in function body, however, this is not
necessary in strict mode, where mutations to the arguments object are
not reflected in the function arguments and vice versa.
By doing that we consistently use Identifier node for identifiers and
also enable mechanism that registers identifiers in a corresponding
ScopePusher for catch parameters, which is necessary for work in the
upcoming changes.
This prevents the variables declared inside a class static initializer
to escape to the nearest containing function causing all sorts of memory
corruptions.
This commit removes the -Wno-unusued-private-field flag, thus
reenabling the warning. Unused field were either removed or marked
[[maybe_unused]] when unsure.
Instead of making a copy of the Vector<FunctionParameter> from the AST
every time we instantiate an ECMAScriptFunctionObject, we now keep the
parameters in a ref-counted FunctionParameters object.
This reduces memory usage, and also allows us to cache the bytecode
executables for default parameter expressions without recompiling them
for every instantiation. :^)
The "with" statement is its own token (TokenType::With), and thus would
fail to parse as an identifier. We've already asserted that the token
we are parsing is "with" or "assert", so just consume it.
Previously it only deoptimized the parent scope if the current scope
contains direct eval, which is incorrect because code ran in direct
eval mode has access to the entire scope chain it was executed in.
The fix is to also propagate direct eval's presence if the current
scope is marked as being screwed by direct eval.
This fixes Google's botguard failing to complete on Google sign in, as
it tried to access local variables outside of a direct parent function
with eval, causing it throw "unhandled" exceptions. Unhandled is in
quotes because their bytecode VM _technically_ caught it, but it was
considered an unhandled exception. This was determined by removing get
optimizations and then adding debug output for every get operation.
Using this, I noticed that for these errors, it would access the
'message' and 'stack' properties. This is because their error handler
function noticed this was not a synthesised error, which is never
expected to happen. That was determined by using Chrome Devtools 'pause
on handled exception' feature, and noticing it never threw a '[var] is
not defined' exception, but only synthesized error objects which
contained a sentinel value to let it know it was synthesized.
I added debug output to eval to print out what was being eval'd because
it makes heavy use of eval. This revealed that the exceptions only came
from eval.
I then dumped every generated executable and noticed the variables it
was trying to access were generated as local variables in the top
scope. This led to checking what makes a variable considered local or
not, which then lead to this block of code in ~ScopePusher that
propagates eval presence only to the immediate parent scope. This
variable directly controls whether to create all variables properly
with variable environments and bindings or allow them to be stored as
local registers tied to that function's executable.
Since this now lets botguard run to completion, it no longer considers
us to be an insecure/potential bot browser when signing in, now
allowing us to be able to sign in to Google.
For example, https://locals.com/site/discover has a script with an
object of the form:
var f = {
parser: {
sync() {},
async async() {},
}
};
We were previously throwing a syntax error on the async function, as we
specifically did not allow using "async" as a function name here.
The following syntax is valid:
```js
e?.example / 1.2
```
Previously, the `/` would be treated as a unterminated regex literal,
because it was calling the regular `consume` instead of
`consume_and_allow_division`.
This is what is done when parsing IdentifierNames in
parse_secondary_expression when a period is encountered.
Allows us to parse clients-main-[hash].js on https://ubereats.com/
This changes the remaining uses of the following functions across LibJS:
- String::format() => String::formatted()
- dbg() => dbgln()
- printf() => out(), outln()
- fprintf() => warnln()
I also removed the relevant 'LogStream& operator<<' overloads as they're
not needed anymore.
This adds a new MetaProperty AST node which will be used for
'new.target' and 'import.meta' meta properties. The parser now
distinguishes between "in function context" and "in arrow function
context" (which is required for this).
When encountering TokenType::New we will attempt to parse it as meta
property and resort to regular new expression parsing if that fails,
much like the parsing of labelled statements.
This is a bit nicer for two reasons:
- The absence of line number/column information isn't based on 'values
are zero' anymore but on Optional's value
- When reporting syntax errors with position information other than the
current token's position we had to store line and column ourselves,
like this:
auto foo_start_line = m_parser_state.m_current_token.line_number();
auto foo_start_column = m_parser_state.m_current_token.line_column();
...
syntax_error("...", foo_start_line, foo_start_column);
Which now becomes:
auto foo_start= position();
...
syntax_error("...", foo_start);
This makes it easier to report correct positions for syntax errors
that only emerge a few tokens later :^)
By having the "is this a use strict directive?" logic in
parse_string_literal() we would apply it to *any* string literal, which
is incorrect and would lead to false positives - e.g.:
"use strict" + 1
`"use strict"`
"\123"; ({"use strict": ...})
Relevant part from the spec which is now implemented properly:
[...] and where each ExpressionStatement in the sequence consists
entirely of a StringLiteral token [...]
I also got rid of UseStrictDirectiveState which is not needed anymore.
Fixes#3903.
https://tc39.es/ecma262/#sec-functiondeclarations-in-ifstatement-statement-clauses
B.3.4 FunctionDeclarations in IfStatement Statement Clauses
The following augments the IfStatement production in 13.6:
IfStatement[Yield, Await, Return] :
if ( Expression[+In, ?Yield, ?Await] ) FunctionDeclaration[?Yield, ?Await, ~Default] else Statement[?Yield, ?Await, ?Return]
if ( Expression[+In, ?Yield, ?Await] ) Statement[?Yield, ?Await, ?Return] else FunctionDeclaration[?Yield, ?Await, ~Default]
if ( Expression[+In, ?Yield, ?Await] ) FunctionDeclaration[?Yield, ?Await, ~Default] else FunctionDeclaration[?Yield, ?Await, ~Default]
if ( Expression[+In, ?Yield, ?Await] ) FunctionDeclaration[?Yield, ?Await, ~Default]
This production only applies when parsing non-strict code. Code matching
this production is processed as if each matching occurrence of
FunctionDeclaration[?Yield, ?Await, ~Default] was the sole
StatementListItem of a BlockStatement occupying that position in the
source code. The semantics of such a synthetic BlockStatement includes
the web legacy compatibility semantics specified in B.3.3.