LibTextCodec: Fix ISO-8859-1 vs. windows-1252 handling in web contexts

The Encoding specification maps ISO-8859-1 to windows-1252 and expects
the windows-1252 translation table to be used, which differs from
ISO-8859-1 for 0x80-0x9F.

Other contexts expect to get the actual ISO-8859-1 encoding, with 1-to-1
mapping to U+0000-U+00FF, when requesting it.

`decoder_for_exact_name` is introduced, which skips the mapping from
aliases to the encoding name done by `get_standardized_encoding`.
This commit is contained in:
Simon Wanner 2024-06-02 15:56:36 +02:00 committed by Andreas Kling
commit 6b2c459901
Notes: sideshowbarker 2024-07-17 00:49:59 +09:00
7 changed files with 107 additions and 95 deletions

View file

@ -41,7 +41,7 @@ WebIDL::ExceptionOr<JS::NonnullGCPtr<TextDecoder>> TextDecoder::construct_impl(J
auto ignore_bom = options.value_or({}).ignore_bom;
// NOTE: This should happen in decode(), but we don't support streaming yet and share decoders across calls.
auto decoder = TextCodec::decoder_for(encoding.value());
auto decoder = TextCodec::decoder_for_exact_name(encoding.value());
VERIFY(decoder.has_value());
return realm.heap().allocate<TextDecoder>(realm, realm, *decoder, lowercase_encoding_name, fatal, ignore_bom);