This produces more granural information on the actual pixel errors
present between two bitmaps. It now also asserts that both bitmaps are
the same size, since we should never compare two differently sized
bitmaps anyway.
Of the available 71 screenshot tests that we have, 42 fail on macOS
arm64. Let's make it possible to skip those and run the remaining
succeeding screenshot tests on arm64 anyway.
When a test has a `reftest-wait` class, we now wait for fonts to load
and then for 2 `requestAnimationFrame` callbacks. This matches the
behavior of the WPT runner and allows more imported WPT tests to work
correctly.
Now that headless mode is built into the main Ladybird executable, the
headless-browser's only purpose is to run tests. So let's move it to the
testing directory and rename it to test-web (a la test-js / test-wasm).