Commit graph

2 commits

Author SHA1 Message Date
Timothy Flynn
97548f48de LibWeb: Port rendered text to UTF-16
This migrates TextNode::text_for_rendering() to Utf16String and deals
with the fallout. As a consequence, this effectively reverts commit
3df83dade8.

The layout test expecation file updates are because we now express text
lengths and offsets in UTF-16 code units.

The selection-over-multiple-code-units expectation file update actually
represents a correctness fix. Our result now matches Firefox.
2025-07-25 18:16:22 +02:00
Jelle Raaijmakers
3df83dade8 LibWeb: Treat DOM::Range offsets as UTF-16 code unit offsets
We generated `PaintableFragment`s with a start and length represented in
UTF-8 byte offsets, but failed to consider that the offsets in a
`DOM::Range` are actually expressed in UTF-16 code units.

This is a bit of a mess: almost all web specs use UTF-16 code units as
the unit for indexing into text nodes, but we almost exclusively use
UTF-8 in our code base. Arguably the best thing would for us to use
UTF-16 everywhere as well: it prevents these mismatches in our
implementations for the price of a bit more memory usage - and even that
could potentially be optimized for.

But for now, try to do the correct thing and lazily allocate UTF-16 data
in a `PaintableFragment` whenever we need to index into it or if we're
asked to determine the code unit offset of a pixel position.
2025-06-13 15:08:26 +02:00