LibWeb: Check charset UTF-16LE/BE separately for UTF-8 conversion

Previously, the charset of name "UTF-16BE/LE" would be checked against
when following standards to convert the charset to UTF-8, but in
reality, the charsets "UTF-16BE" and "UTF-16LE" should be checked
separately.

Co-authored-by: Jelle Raaijmakers <jelle@ladybird.org>
This commit is contained in:
Jaycadox 2025-02-24 10:26:28 +01:00 committed by Jelle Raaijmakers
parent 436f3f99a1
commit f672c57ca7
Notes: github-actions[bot] 2025-02-24 13:52:43 +00:00
3 changed files with 33 additions and 15 deletions

View file

@ -311,7 +311,8 @@ Optional<ByteString> run_prescan_byte_stream_algorithm(DOM::Document& document,
if (!need_pragma.has_value() || (need_pragma.value() && !got_pragma) || !charset.has_value())
continue;
if (charset.value() == "UTF-16BE/LE")
// https://encoding.spec.whatwg.org/#common-infrastructure-for-utf-16be-and-utf-16le
if (charset.value() == "UTF-16BE" || charset.value() == "UTF-16LE")
return "UTF-8";
else if (charset.value() == "x-user-defined")
return "windows-1252";