ListFormat was the first formatter I ported to ICU. This patch makes it
match the style of subsequently ported formatters, where we create the
formatter once per Intl object, rather than once per prototype
invocation.
We were previously treating undefined and null as the same (an empty
Optional). However, there are edge cases in ECMA-402 where we must treat
them differently. Namely, the hour cycle (hc) keyword. An undefined hc
value has no effect on the resolved locale, whereas a null hc value can
actively override any hc specified in the locale string. For example:
new Intl.DateTimeFormat("en-u-hc-h11", { hour12: false });
In that object, the hour12 option does not match the u-hc-h11 value. So
the spec dictates we remove the hc value by setting it to null.
This uses ICU for all of the Intl.PluralRules prototypes, which lets us
remove all data from our plural rules generator.
Plural rules depend directly on internal data from the number formatter,
so rather than creating a separate Locale::PluralRules class (which will
make accessing that data awkward), this adds plural rules APIs to the
existing Locale::NumberFormat.
The proposal has undergone quite a few normative changes since we last
synced with it. There was a time when it could not be implemented as it
was written, which is no longer the case. The resulting proposal has had
so many changes compared to our implementation, that it wouldn't make
sense to implement them commit-by-commit as we normally do. So instead,
this just implements the HEAD revision of the spec in one pass.
This uses ICU for the Intl.DateTimeFormat `format` `formatToParts`,
`formatRange`, and `formatRangeToParts`.
This lets us remove most data from our date-time format generator. All
that remains are time zone data and locale week info, which are relied
upon still for other interfaces. So they will be removed in a future
patch.
Note: All of the changes to the test files in this patch are now aligned
with other browsers. This includes:
* Some very incorrect formatting of Japanese symbols. (Looking at the
old results now, it's very obvious they were wrong.)
* Old FIXMEs regarding range formatting not including the start/end date
when only time fields were requested, but the dates differ.
* Day period inconsistencies.
Properties marked with the [[Unimplemented]] attribute behave as normal
but invoke the `VM::on_unimplemented_property_access callback` when
they are accessed.
The IntlMV is meant to be arbitrarily precise. If the user provides a
string value to be formatted, we lose precision by converting extremely
large values to a double. We were never able to address this, as support
for arbitrary precision was a big FIXME. But ICU can handle it by just
passing the raw string on through.
This uses ICU for the Intl.NumberFormat `formatRange` and
`formatRangeToParts` prototypes.
Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Safari.
This uses ICU for the Intl.NumberFormat `format` and `formatToParts`
prototypes. It does not yet port the range formatter prototypes.
Most of the new code in LibLocale/NumberFormat is simply mapping from
ECMA-402 types to ICU types. Beyond that, the only algorithmic change is
that we have to mutate the output from ICU for `formatToParts` to match
what is expected by ECMA-402. This is explained in NumberFormat.cpp in
`flatten_partitions`.
This lets us remove most data from our number format generator. All that
remains are numbering system digits and symbols, which are relied upon
still for other interfaces (e.g. Intl.DateTimeFormat). So they will be
removed in a future patch.
Note: All of the changes to the test files in this patch are now aligned
with both Chrome and Safari.
Note: We keep locale parsing and syntactic validation as-is. ECMA-402
places additional restrictions on locales above what is required by the
Unicode spec. ICU doesn't provide methods that let us easily check those
restrictions, whereas LibLocale does. Other browsers also implement
their own validators here.
This introduces a locale cache to re-use parsed locale data and various
related structures (not doing so has a non-negligible performance impact
on Intl tests).
The existing APIs for canonicalization and display names are pretty
intertwined, so they must both be adapted at once here. The results of
canonicalization are slightly different on some edge cases. But the
changed results are actually now aligned with Chrome and Safari.
...that maintains a list of allocated execution contexts so malloc()
does not have to be called every time we need to get a new one.
20% improvement in Octane/typescript.js
16% improvement in Kraken/imaging-darkroom.js
By using separate struct we can avoid updating AST node and
ECMAScriptFunctionObject constructors every time there is a need to
add or remove some additional information colllected during parsing.
Allows to skip function environment allocation for non-arrow functions
if the only reason it is needed is to hold `this` binding.
The parser is changed to do following:
- If a function is an arrow function and uses `this` then all functions
in a scope chain are marked to allocate function environment for
`this` binding.
- If a function uses `new.target` then all functions in a scope chain
are marked to allocate function environment.
`ordinary_call_bind_this()` is changed to put `this` value in execution
context when function environment allocation is skipped.
35% improvement in Octane/typescript.js
50% improvement in Octane/deltablue.js
19% improvement in Octane/raytrace.js
This allows us to skip allocating a function environment in cases where
it was previously impossible because the arguments object needed a
binding.
This change does not bring visible improvement in Kraken or Octane
benchmarks but seems useful to have anyway.
This means that SetVariable instructions will now remember which
(relative) environment contains the targeted binding, letting it bypass
the full binding resolution machinery on subsequent accesses.
Merging registers, constants and locals into single vector means:
- Better data locality
- No need to check type in Interpreter::get() and Interpreter::set()
which are very hot functions
Performance improvement is visible in almost all Octane and Kraken
tests.
Filling a typed array with an integer shouldn't have to go through the
generic Set for every element.
This knocks a 7% item down to <1% in profiles of Another World JS at
https://cyxx.github.io/another_js/
Instead, generate bytecode to execute their AST nodes and save the
resulting operands inside the NewClass instruction.
Moving property expression evaluation to happen before NewClass
execution also moves along creation of new private environment and
its population with private members (private members should be visible
during property evaluation).
Before:
- NewClass
After:
- CreatePrivateEnvironment
- AddPrivateName
- ...
- AddPrivateName
- NewClass
- LeavePrivateEnvironment
Instead of relying on native stack overflows to kick us out of circular
proxy chains, we now keep track of the recursion depth and kick
ourselves out if it exceeds 10'000.
This fixes an issue where compiler tail/sibling call optimizations would
turn infinite recursion into infinite loops, and thus never hit a stack
overflow to kick itself out.
For whatever reason, we've only seen the issue on SerenityOS with UBSAN,
but it could theoretically happen on any platform.
By doing that all instructions required for instantiation are emitted
once in compilation and then reused for subsequent calls, instead of
running generic instantiation process for each call.
Instead of storing a full JS::Completion for the "throw completion"
case, we now store a simple JS::Value (wrapped in ErrorValue for the
type system).
This shrinks TCO<void> and TCO<Value> (the most common TCOs) by 8 bytes,
which has a non-trivial impact on performance.