ladybird/Userland/Libraries/LibTextCodec
Andreas Kling 1a46d8df5f LibTextCodec: Use String::from_utf8() when decoding UTF-8 to UTF-8
This way, we still perform UTF-8 validation, but don't go through the
slow generic code path that rebuilds the decoded string one code point
at a time.

This was a bottleneck when loading a canned copy of reddit.com, which
ended up being ~120 MiB large.

- Time spent decoding UTF-8 before this change: 1192 ms
- Time spent decoding UTF-8 after this change:  154 ms

That's still a long time, but 7.7x faster is nothing to sneeze at! :^)

Note that if the input fails UTF-8 validation, we still fall back to
the slow path and insert replacement characters per the WHATWG Encoding
spec: https://encoding.spec.whatwg.org/#utf-8-decode
2024-07-20 14:29:37 +02:00
..
CMakeLists.txt LibTextCodec: Add GBK/GB18030 decoder 2024-05-31 07:56:26 +02:00
Decoder.cpp LibTextCodec: Use String::from_utf8() when decoding UTF-8 to UTF-8 2024-07-20 14:29:37 +02:00
Decoder.h LibTextCodec: Use generated lookup tables for all single byte decoders 2024-06-04 10:21:07 +02:00
indexes.json LibTextCodec: Add GBK/GB18030 decoder 2024-05-31 07:56:26 +02:00