Blame view

public/vendor/masterminds/html5/src/HTML5/Parser/README.md 1.66 KB
86143e36f   Андрей Ларионов   Коммит вторник
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
  # The Parser Model
  
  The parser model here follows the model in section
  [8.2.1](http://www.w3.org/TR/2012/CR-html5-20121217/syntax.html#parsing)
  of the HTML5 specification, though we do not assume a networking layer.
  
       [ InputStream ]    // Generic support for reading input.
             ||
        [ Scanner ]       // Breaks down the stream into characters.
             ||
       [ Tokenizer ]      // Groups characters into syntactic
             ||
      [ Tree Builder ]    // Organizes units into a tree of objects
             ||
       [ DOM Document ]     // The final state of the parsed document.
  
  
  ## InputStream
  
  This is an interface with at least two concrete implementations:
  
  - StringInputStream: Reads an HTML5 string.
  - FileInputStream: Reads an HTML5 file.
  
  ## Scanner
  
  This is a mechanical piece of the parser.
  
  ## Tokenizer
  
  This follows section 8.4 of the HTML5 spec. It is (roughly) a recursive
  descent parser. (Though there are plenty of optimizations that are less
  than purely functional.
  
  ## EventHandler and DOMTree
  
  EventHandler is the interface for tree builders. Since not all
  implementations will necessarily build trees, we've chosen a more
  generic name.
  
  The event handler emits tokens during tokenization.
  
  The DOMTree is an event handler that builds a DOM tree. The output of
  the DOMTree builder is a DOMDocument.
  
  ## DOMDocument
  
  PHP has a DOMDocument class built-in (technically, it's part of libxml.)
  We use that, thus rendering the output of this process compatible with
  SimpleXML, QueryPath, and many other XML/HTML processing tools.
  
  For cases where the HTML5 is a fragment of a HTML5 document a
  DOMDocumentFragment is returned instead. This is another built-in class.