We currently implement the official cookie RFC, which was last updated
in 2011. Unfortunately, web reality conflicts with the RFC. For example,
all of the major browsers allow nameless cookies, which the RFC forbids.
There has since been draft versions of the RFC published to address such
issues. This patch implements the latest draft.
Major differences include:
* Allowing nameless or valueless (but not both) cookies
* Formal cookie length limits
* Formal same-site rules (not fully implemented here)
* More rules around cookie domains
Cookies are typically deleted by setting their expiry time to an ancient
time stamp (i.e. this is how WebDriver is required to delete cookies).
Previously, we would update the cookie in the cookie jar, which would
mark the cookie as dirty. We would then purge expired cookies from the
jar's transient storage, which removed the cookie from the dirty list.
If the cookie was also in the persisted storage, it would never become
expired there as it was no longer in the dirty list when the timer for
synchronization fired.
Now, we don't remove any cookies from the transient dirty list when we
purge expired cookies. We hold onto the dirty cookie until sync time,
where we now update the cookie in the persisted storage *before* we
delete expired cookies.
Web specs do not return through javascript percent decoded URL path
components - but we were doing this in a number of places due to the
default behaviour of URL::serialize_path.
Since percent encoded URL paths may not contain valid UTF-8 - this was
resulting in us crashing in these places.
For example - on an HTMLAnchorElement when retrieving the pathname for
the URL of:
http://ladybird.org/foo%C2%91%91
To fix this make the URL class only return the percent encoded
serialized path, matching the URL spec. When the decoded path is
required instead explicitly call URL::percent_decode.
This fixes a crash running WPT URL tests for the anchor element on:
https://wpt.live/url/a-element.html
This makes WebView::Database wrap around sqlite3 instead of LibSQL. The
effect on outside callers is pretty minimal. The main consequences are:
1. We must ensure the Cookie table exists before preparing any SQL
statements involving that table.
2. We can use an INSERT OR REPLACE statement instead of separate INSERT
and UPDATE statements.
Now that the chrome process is a singleton on all platforms, we can
safely add a cache to the CookieJar to greatly speed up access. The way
this works is we read all cookies upfront from the database. As cookies
are updated by the web, we store a list of "dirty" cookies that need to
be flushed to the database. We do that synchronization every 30 seconds
and at shutdown.
There's plenty of room for improvement here, some of which is marked
with FIXMEs in the CookieJar.
Before these changes, in a SQL database populated with 300 cookies,
browsing to https://twinings.co.uk/ WebContent spent:
19,806ms waiting for a get-cookie response
505ms waiting for a set-cookie response
With these changes, it spends:
24ms waiting for a get-cookie response
15ms waiting for a set-cookie response
When WebDriver accesses cookies, it specifically says to run:
the first step of the algorithm in RFC6265 to compute cookie-string
So we should skip subsequent steps. We already skip step 2, which sorts
the cookies, but neglected to skip step 3 to update their last access
time.
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.
This change has two main benefits:
* Moving AK back more towards being an agnostic library that can
be used between the kernel and userspace. URL has never really fit
that description - and is not used in the kernel.
* URL _should_ depend on LibUnicode, as it needs punnycode support.
However, it's not really possible to do this inside of AK as it can't
depend on any external library. This change brings us a little closer
to being able to do that, but unfortunately we aren't there quite
yet, as the code generators depend on LibCore.
Note no test here, because this early return involves HTTP-only cookies,
which we don't have the infrastructure to test (we would need to support
custom HTTP headers in tests).
Getting a document's cookie value currently involves:
1. Doing a large SELECT statement and filtering the results to match
the document and some query parameters based on the cookie RFC.
2. For every cookie selected this way, doing an UPDATE to set its last
access time.
3. For every UPDATE, do a DELETE to remove all expired cookies.
There's no need to perform cookie expiration for every UPDATE. Instead,
we can do the expiration once after all the UPDATEs are complete.
This reduces time spent waiting for cookies on https://twinings.co.uk
from ~1.9s to ~1.3s on my machine.
LibSQL supports serializing time stamps as of commit
effcd080ca.
However, that commit serializes the timestamps as milliseconds, whereas
the CookieJar was serializing them as seconds. In retrospect, these
should have been updated in unison, along with the SQL heap version (as
this is a serialization change that affects the file format). So this
patch also updates the version, as this is not a backwards compatible
change.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
These classes are used as-is in all chromes. Move them to LibWebView so
that non-Serenity chromes don't have to awkwardly reach into its headers
and sources.
2023-08-31 19:19:45 +02:00
Renamed from Userland/Applications/Browser/CookieJar.cpp (Browse further)