Here's my first try at explaining this:
Basically, you start with your input sentence at the first rule, moving down the list until all the non-terminals' are finally replaced with 'terminals'. Not all rules apply to all parses, and sometimes a parse will arrive at a 'dead-end', where the parser will 'walk' back up the tree & try the next possible rule. Note that there can be duplicate rules strings for any one rule number (I've separated them by rule number in the table below).
Important notes: This grammar is ambiguous! This means that you could parse an input sentence multiple ways (I won't show it here, but trust me). There are several things that overcome this problem. First of all, it is a left-handed parse (meaning we always proceed left-to-right through our input string matching tokens (words). Second, order of rules in the grammar matter, resulting in the first complete parse is what is accepted - a rule higher up in the list takes precedence in the parse.
First, I've translated the hex 'operators' into abbreviated actual values. Here is the table for that:
Abbr | Type | Explanation |
---- | ------------ | ----------- |
Pred | Non-terminal | Predicate part: This identifies the first part of a sentence |
Subj | Non-terminal | Subject part: This identifies the second part of a sentence |
Suff | Non-terminal | Suffix part: This identifies the third and last part of a sentence |
Refe | Non-terminal | Reference part: This identifies words that reference another word in the same sentence part |
mCla | Terminal | Match on class mask |
mWor | Terminal | Match on word group ---Note, does not appear in this grammar |
Forc | Terminal | Force storage: Apparently, this was only used for debugging. ---nor does this |
Let's start with an example 'open door' to walk through the table. For reference, open is an imperative verb while door is a noun. For now, we will ignore the meanings & focus on the 'values'. Note that the rules that actually apply to the successful parse are indicated with a
red step number.
Step | Grammar Rule | Notes |
01. | 013f -> 013c | Really just a placeholder. It instructs us to proceed to rule 013c. |
02. | 013c -> 013b | The first rule in the 013c 'set' instructs us to proceed to rule 13b. |
03. | 013b -> 0136, 013b | The first rule in the 013b 'set' instructs us to proceed to rule 0136. |
04. | 0136 is a terminal rule, the 'indicative verb' | Not a match to our first word in our input string (looking for an imperative verb). |
05. | 013b -> 0136, 0130, 013b | 0136 is no match (same as #4 above) |
06. | 013b -> 0136 | 0136 is no match, skip |
07. | 013b -> 0136 0133 012f | 0136 is no match, skip |
08. | 013b -> 0136 0133 0133 013f | 0136 is no match, skip |
09. | 013b -> 012f | |
10. | 012f is a terminal rule - the 'imperative verb' | A match for our 1st token in our input string! |
11. | Now the 2nd token in our input string | Start back at the beginning of the parse rules... |
12. | 013f -> 013c | |
13. | 013c -> 013b | skip all (9) 013b rules dead-end with no match on our noun terminal (0130). Details removed for brevity... |
14. | 013c -> 013d | proceed to rule 013d |
15. | 013d -> 0131, 0139 | skip - we don't have enough input tokens left to fulfill this rule |
16. | 013d -> 0131, 013a, 0139 | skip, not enough input tokens |
17. | 013d -> 013a | skip - ultimately dead-ends with no 0130 to match our noun... |
18. | 013d -> 0139 | proceed to this rule... |
19. | 0139 -> 0130, 0135, 013d | skip, not enough tokens |
20. | 0139 -> 0130, 0139 | skip, not enough tokens |
21. | 0139 -> 0130 | proceed... |
22. | 0130 is a terminal rule - the 'noun' | a match for our 2nd token in our input string! |
23. | Finished. No more tokens. |
Here's a recap of which rules actually contributed to our parse:
Step | Input string | Next rule / notes |
0. | open door | Before parse... |
1. | 013f door | (next rule: 013c) |
9. | 013c door | (next rule: 012f) |
10. | 012f door | (terminal, imp,verb - done with token 1) |
12. | 012f 013f | (starting at beginning of parse tree with 2nd token) |
14. | 012f 013c | (next rule: 013d) |
18. | 012f 013d | (next rule: 0139) |
21. | 012f 0139 | (next rule: 0130) |
22. | 012f 0130 | (terminal, noun - done with token 2) |
23. | Done | |
Note: In many of the rules above, I cheated for brevity. Technically, we should take the first rule in the list & navigate all the way down the 'tree' (or parse rules) to a terminal (if possible). If all the tokens in a parse are terminated, but there are 'incomplete' rules (rules left in the parse, but no tokens to apply them to), then it's treated as 'no match'.
Here all the parse rules in the 'black box', represented in order. Note that M is the 'meaning' of V, the value.
Rule (M) (V) (M) (V) (M) (V) (M) (V)
---- ---- ---- ---- ---- ---- ---- ---- ----
013f Pred 013c
013c Pred 013b
013c Pred 013b Subj 013d Refe 0133
013c Pred 013b Subj 013d Refe 013a
013c Pred 013b Subj 013d Refe 0137
013c Pred 013b Subj 013d
013c Pred 013b Subj 013d Refe 0133 Suff 013d
013c Pred 013b Suff 013d Subj 013d
013c Pred 013b Subj 013d Suff 013e
013c Pred 013b Suff 013e
013c Subj 013d
013c Subj 013d Suff 013e
013c Refe 013b Refe 013d Refe 0133 Pred 013c
013c Refe 013d Pred 013c
013d 0145 0131 Pred 0139 // I have *no* idea what operator 0145 corresponds to
013d 0145 0131 Refe 013a Pred 0139
013d Refe 013a Pred 0139
013d Pred 013a
013d Pred 0139
013e Refe 0138 Pred 013d
013e Refe 0136 Pred 013d
013e Refe 0138 Pred 0136 Refe 013d
013e Refe 0138 Pred 0136 Refe 0132
013e Pred 013b
013e Refe 013b Pred 013e
013e Refe 0138 Pred 013d Refe 0133 Refe 013d
0139 Pred 0130 0145 0135 Pred 013d
0139 Refe 0130 Pred 0139
0139 Pred 0130
013b Refe 0136 Pred 013b
013b Refe 0136 Refe 0130 Pred 013b
013b Pred 0136
013b Refe 0136 Refe 0133 Pred 012f
013b Refe 0136 Refe 0133 Refe 0133 Pred 012f
013b Pred 012f
013b Pred 012f Refe 0133
013b Pred 012f Refe 0133 Refe 0133
013b Pred 012f Refe 013a
013b Pred 012f Refe 0137
013b Refe 0137 Refe 0137 Pred 013b
013b Refe 0137 Pred 013b
013b Pred 0137
013b Refe 0137 Refe 0130 Pred 013b
013b Pred 012f 0145 0135 Pred 012f
013b Pred 012f Pred 012f 0145 0135 Pred 012f
013a Pred 0132
013a Refe 0132 Pred 013a
(Terminals)
0137 mCla 0400 // Adverb
012f mCla 0800 // Imperative Verb
012f mCla 0200 // Indicative verb
0136 mCla 0200 // Indicative verb
0130 mCla 0100 // Noun
0134 mCla 0080 // Pronoun
0132 mCla 0040 // Adjective
0131 mCla 0020 // Article
0133 mCla 0010 // Preposition
0138 mCla 0008 // Special
0135 mCla 0004 // Special
In my next post, I'll explain the other part of the parse - the semantic part... (using the other tokens in the grammar we did not touch on here, the subject, predicate, suffix & reference). From this exercise we have all the 'words' for the Said() string; the semantic part will determine the way the words are 'joined' together in the said string ('/', '<', etc).