AK+LibWeb: Replace our home-grown base64 encoder/decoders with simdutf

We currently have 2 base64 coders: one in AK, another in LibWeb for a
"forgiving" implementation. ECMA-262 has an upcoming proposal which will
require a third implementation.

Instead, let's use the base64 implementation that is used by Node.js and
recommended by the upcoming proposal. It handles forgiving decoding as
well.

Our users of AK's implementation should be fine with the forgiving
implementation. The AK impl originally had naive forgiving behavior, but
that was removed solely for performance reasons.

Using http://mattmahoney.net/dc/enwik8.zip (100MB unzipped) as a test,
performance of our old home-grown implementations vs. the simdutf
implementation (on Linux x64):

                Encode    Decode
AK base64       0.226s    0.169s
LibWeb base64   N/A       1.244s
simdutf         0.161s    0.047s
This commit is contained in:
Timothy Flynn 2024-07-15 15:25:08 -04:00 committed by Andreas Kling
commit bfc9dc447f
Notes: sideshowbarker 2024-07-16 23:34:49 +09:00
11 changed files with 60 additions and 310 deletions

View file

@ -32,7 +32,6 @@
#include <LibWeb/HighResolutionTime/Performance.h>
#include <LibWeb/HighResolutionTime/SupportedPerformanceTypes.h>
#include <LibWeb/IndexedDB/IDBFactory.h>
#include <LibWeb/Infra/Base64.h>
#include <LibWeb/PerformanceTimeline/EntryTypes.h>
#include <LibWeb/PerformanceTimeline/PerformanceObserver.h>
#include <LibWeb/PerformanceTimeline/PerformanceObserverEntryList.h>
@ -130,14 +129,14 @@ WebIDL::ExceptionOr<String> WindowOrWorkerGlobalScopeMixin::atob(String const& d
auto& realm = *vm.current_realm();
// 1. Let decodedData be the result of running forgiving-base64 decode on data.
auto decoded_data = Infra::decode_forgiving_base64(data.bytes_as_string_view());
auto decoded_data = decode_base64(data);
// 2. If decodedData is failure, then throw an "InvalidCharacterError" DOMException.
if (decoded_data.is_error())
return WebIDL::InvalidCharacterError::create(realm, "Input string is not valid base64 data"_fly_string);
// 3. Return decodedData.
// decode_base64() returns a byte string. LibJS uses UTF-8 for strings. Use Latin1Decoder to convert bytes 128-255 to UTF-8.
// decode_base64() returns a byte buffer. LibJS uses UTF-8 for strings. Use Latin1Decoder to convert bytes 128-255 to UTF-8.
auto decoder = TextCodec::decoder_for_exact_name("ISO-8859-1"sv);
VERIFY(decoder.has_value());
return TRY_OR_THROW_OOM(vm, decoder->to_utf8(decoded_data.value()));