LibTextCodec: Fix ISO-8859-1 vs. windows-1252 handling in web contexts

The Encoding specification maps ISO-8859-1 to windows-1252 and expects the windows-1252 translation table to be used, which differs from ISO-8859-1 for 0x80-0x9F. Other contexts expect to get the actual ISO-8859-1 encoding, with 1-to-1 mapping to U+0000-U+00FF, when requesting it. `decoder_for_exact_name` is introduced, which skips the mapping from aliases to the encoding name done by `get_standardized_encoding`.
Author: https://github.com/skyrising Commit: 6b2c459901 Pull-request: https://github.com/LadybirdBrowser/ladybird/pull/32 Reviewed-by: https://github.com/jamierocks
2025-07-28 11:49:44 +00:00 · 2024-06-02 15:56:36 +02:00 · 2024-06-02 15:56:36 +02:00 · 6b2c459901 · 2024-07-17 00:49:59 +09:00
commit 6b2c459901
parent 46d5cf0443
7 changed files with 107 additions and 95 deletions
--- a/Userland/Libraries/LibWeb/Encoding/TextDecoder.cpp
+++ b/Userland/Libraries/LibWeb/Encoding/TextDecoder.cpp
@ -41,7 +41,7 @@ WebIDL::ExceptionOr<JS::NonnullGCPtr<TextDecoder>> TextDecoder::construct_impl(J
    auto ignore_bom = options.value_or({}).ignore_bom;

    // NOTE: This should happen in decode(), but we don't support streaming yet and share decoders across calls.
-    auto decoder = TextCodec::decoder_for(encoding.value());
+    auto decoder = TextCodec::decoder_for_exact_name(encoding.value());
    VERIFY(decoder.has_value());

    return realm.heap().allocate<TextDecoder>(realm, realm, *decoder, lowercase_encoding_name, fatal, ignore_bom);