LibWeb: Do not require multipart form data to end with CRLF
Some checks are pending
CI / macOS, arm64, Sanitizer, Clang (push) Waiting to run
CI / Linux, x86_64, Fuzzers, Clang (push) Waiting to run
CI / Linux, x86_64, Sanitizer, GNU (push) Waiting to run
CI / Linux, x86_64, Sanitizer, Clang (push) Waiting to run
Package the js repl as a binary artifact / Linux, arm64 (push) Waiting to run
Package the js repl as a binary artifact / macOS, arm64 (push) Waiting to run
Package the js repl as a binary artifact / Linux, x86_64 (push) Waiting to run
Run test262 and test-wasm / run_and_update_results (push) Waiting to run
Lint Code / lint (push) Waiting to run
Label PRs with merge conflicts / auto-labeler (push) Waiting to run
Push notes / build (push) Waiting to run

According to RFC 2046, the BNF of the form data body is:

    multipart-body := [preamble CRLF]
                      dash-boundary transport-padding CRLF
                      body-part *encapsulation
                      close-delimiter transport-padding
                      [CRLF epilogue]

Where "epilogue" is any text that "may be ignored or discarded". So we
should stop parsing the body once we encounter the terminating delimiter
("--").

Note that our parsing function is from an attempt to standardize the
grammar in the spec: https://andreubotella.github.io/multipart-form-data
This proposal hasn't been updated in ~4 years, and the fetch spec still
does not have a formal definition of the body string.
This commit is contained in:
Timothy Flynn 2025-09-15 09:23:21 -04:00 committed by Jelle Raaijmakers
commit 7b3465ab55
Notes: github-actions[bot] 2025-09-15 16:33:58 +00:00
3 changed files with 45 additions and 1 deletions

View file

@ -394,7 +394,9 @@ MultipartParsingErrorOr<Vector<XHR::FormDataEntry>> parse_multipart_form_data(JS
return MultipartParsingError { MUST(String::formatted("Expected `--` followed by boundary at position {}", lexer.tell())) };
// 2. If position points to the sequence of bytes 0x2D 0x2D 0x0D 0x0A (`--` followed by CR LF) followed by the end of input, return entry list.
if (lexer.next_is("--\r\n"sv))
// NOTE: We do not require the input to end with CRLF to match the behavior of other browsers. According to RFC 2046, we are to discard any
// text after the terminating `--`. See: https://datatracker.ietf.org/doc/html/rfc2046#page-22
if (lexer.next_is("--"sv))
return entry_list;
// 3. If position does not point to a sequence of bytes starting with 0x0D 0x0A (CR LF), return failure.

View file

@ -0,0 +1,6 @@
Data: value0
Data: value1
Data: value2
Data: value3
Data: value4
Data: value5

View file

@ -0,0 +1,36 @@
<!DOCTYPE html>
<script src="../include.js"></script>
<script>
let testID = 0;
const runTest = async terminator => {
const boundary = "AaB03x";
const body =
`--${boundary}\r\n` +
'Content-Disposition: form-data; name="field"\r\n' +
"\r\n" +
`value${testID++}\r\n` +
`--${boundary}--\r\n` +
terminator;
const response = new Response(
new Blob([body], { type: `multipart/form-data; boundary=${boundary}` }),
{
headers: { "Content-Type": `multipart/form-data; boundary=${boundary}` },
}
);
const data = await response.formData();
println(`Data: ${data.get("field")}`);
};
asyncTest(async done => {
await runTest("");
await runTest("\r");
await runTest("\n");
await runTest("\r\n");
await runTest("junk");
await runTest("\r\njunk");
done();
});
</script>