Commit graph

9 commits

Author SHA1 Message Date
Timothy Flynn
2803d66d87 AK: Support UTF-16 string formatting
The underlying storage used during string formatting is StringBuilder.
To support UTF-16 strings, this patch allows callers to specify a mode
during StringBuilder construction. The default mode is UTF-8, for which
StringBuilder remains unchanged.

In UTF-16 mode, we treat the StringBuilder's internal ByteBuffer as a
series of u16 code units. Appending a single character will append 2
bytes for that character (cast to a char16_t). Appending a StringView
will transcode the string to UTF-16.

Utf16String also gains the same memory optimization that we added for
String, where we hand-off the underlying buffer to Utf16String to avoid
having to re-allocate.

In the future, we may want to further optimize for ASCII strings. For
example, we could defer committing to the u16-esque storage until we
see a non-ASCII code point.
2025-07-18 12:45:38 -04:00
Timothy Flynn
d403f02988 AK: Remove unused capacity field from StringData
This was added to be used with `kfree_sized` when we construct a String
from a StringBuilder. As of 53cac71cec, it
is unused, causing some compilers to raise a warning.

This reduces the size of StringData from 24 to 16 bytes.
2025-04-03 23:44:40 +02:00
Ali Mohammad Pur
c39eddaef8 Revert "AK: Don't try to free(UINTPTR_MAX) in StringData::delete()"
This reverts commit 693fe76d1c.
2025-03-27 15:58:57 +00:00
Andreas Kling
693fe76d1c AK: Don't try to free(UINTPTR_MAX) in StringData::operator delete()
Some checks are pending
CI / Lagom (arm64, Sanitizer_CI, false, macos-15, macOS, Clang) (push) Waiting to run
CI / Lagom (x86_64, Fuzzers_CI, false, ubuntu-24.04, Linux, Clang) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, false, ubuntu-24.04, Linux, GNU) (push) Waiting to run
CI / Lagom (x86_64, Sanitizer_CI, true, ubuntu-24.04, Linux, Clang) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (macos-14, macOS, macOS-universal2) (push) Waiting to run
Package the js repl as a binary artifact / build-and-package (ubuntu-24.04, Linux, Linux-x86_64) (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run
...in constant-evaluated contexts. This was messing up GCC when building
the test262 runner. UINTPTR_MAX is the StringBase "invalid" tag.
2025-03-26 10:47:55 -04:00
Andreas Kling
53cac71cec AK: Inline most StringBase member functions
More work on recovering the performance regression from
DeprecatedFlyString removal.

Local measurements on my MBP:
- 2.5% speedup on Octane/zlib.js
- 2% speedup on Octane/typescript.js
2025-03-26 12:04:00 +00:00
Timothy Flynn
29879a69a4 AK: Construct Strings from StringBuilder without re-allocating the data
Currently, invoking StringBuilder::to_string will re-allocate the string
data to construct the String. This is wasteful both in terms of memory
and speed.

The goal here is to simply hand the string buffer over to String, and
let String take ownership of that buffer. To do this, StringBuilder must
have the same memory layout as Detail::StringData. This layout is just
the members of the StringData class followed by the string itself.

So when a StringBuilder is created, we reserve sizeof(StringData) bytes
at the front of the buffer. StringData can then construct itself into
the buffer with placement new.

Things to note:
* StringData must now be aware of the actual capacity of its buffer, as
  that can be larger than the string size.
* We must take care not to pass ownership of inlined string buffers, as
  these live on the stack.
2024-07-20 06:45:49 +02:00
Timothy Flynn
77eef8a8f6 AK: Add missing includes to StringData.h
Opening StringData.h in any clangd-enabled editor previously resulted in
a sea of clangd errors.
2024-07-20 06:45:49 +02:00
Andreas Kling
a88799c032 AK: Remove excessive hashing caused by FlyString table
Before this change, the global FlyString table looked like this:

    HashMap<StringView, Detail::StringBase>

After this change, we have:

    HashTable<Detail::StringData const*, FlyStringTableHashTraits>

The custom hash traits are used to extract the stored hash from
StringData which avoids having to rehash the StringView repeatedly like
we did before.

This necessitated a handful of smaller changes to make it work.
2024-03-24 13:28:24 +01:00
Andreas Kling
8bfad24708 AK: Move AK::Detail::StringData to its own header file
This will allow us to access it from FlyString.cpp
2024-03-24 13:28:24 +01:00