ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-10-19 06:29:43 +00:00

Author	SHA1	Message	Date
Luke	19d6884529	LibWeb: Implement quirks mode detection This allows us to determine which mode to render the page in. Exposes "doctype" and "compatMode" on Document. Exposes "name", "publicId" and "systemId" on DocumentType.	2020-07-21 01:08:32 +02:00
stelar7	5eb39a5f61	LibWeb: Update parser with more insertion modes :^) Implements handling of InHeadNoScript, InSelectInTable, InTemplate, InFrameset, AfterFrameset, and AfterAfterFrameset.	2020-06-21 10:13:31 +02:00
Andreas Kling	b6288163f1	LibWeb: Make the new HTML parser parse input as UTF-8 We already convert the input to UTF-8 before starting the tokenizer, so all this patch had to do was switch the tokenizer to use an Utf8View for its input (and to emit 32-bit codepoints.)	2020-06-04 21:12:17 +02:00
Kyle McLean	1ad81e4833	LibWeb: Parse "br" end tags during "in body"	2020-06-04 09:09:33 +02:00
Andreas Kling	4788bcd6f8	LibWeb: Add HTMLToken::make_character() It's tedious to make character tokens manually all the time.	2020-05-28 18:43:52 +02:00
Andreas Kling	772b51038e	LibWeb: Parse "input" tags during the "in body" insertion mode	2020-05-28 12:19:18 +02:00
Andreas Kling	f62a8d3b19	LibWeb: Handle some more parser inputs in the "in head" insertion mode	2020-05-25 20:16:48 +02:00
Andreas Kling	20911efd4d	LibWeb: More work on the HTML parser and tokenizer The parser can now switch the state of the tokenizer! Very webby. :^)	2020-05-24 23:54:22 +02:00
Andreas Kling	31db3f21ae	LibWeb: Start implementing character token parsing Now that we've gotten rid of the misguided character buffering in the tokenizer, it actually spits out character tokens that we have to deal with in the parser. This patch implements enough to bring us back to speed with simple.html	2020-05-24 23:54:22 +02:00
Andreas Kling	fd1b31d0ff	LibWeb: Start building the tree building part of the new HTML parser This patch adds a new HTMLDocumentParser class. It keeps a tokenizer object internally and feeds itself with one token at a time from it. The names and idioms in this class are expressed as closely to the actual HTML parsing spec as possible, to make development as easy and bug free as possible. :^) This is going to become pretty large, but it's pretty cool!	2020-05-24 00:14:23 +02:00
Andreas Kling	6caa5661f3	LibWeb: Teach HTMLTokenizer how to tokenize attributes Properly tokenize single-quoted, double-quoted and unquoted attributes!	2020-05-23 01:22:15 +02:00
Andreas Kling	004ef9a86b	LibWeb: Minor tweaks to HTMLToken declaration	2020-05-22 23:45:02 +02:00
Andreas Kling	272b35d2e1	LibWeb: Begin work on a spec-compliant HTML parser In order to actually view the web as it is, we're gonna need a proper HTML parser. So let's build one! This patch introduces the Web::HTMLTokenizer class, which currently operates on a StringView input stream where it fetches (ASCII only atm) codepoints and tokenizes acccording to the HTML spec tokenization algo. The tokenizer state machine looks a bit weird but is written in a way that tries to mimic the spec as closely as possible, in order to make development easier and bugs less likely. This initial version is far from finished, but it can parse a trivial document with a DOCTYPE and open/close tags. :^)	2020-05-22 21:46:13 +02:00

13 commits