Automata & Complexity

Table of Contents

Normalising (contd)

Removal of unit production rules

Unit production rule: A → B, where B is a variable Steps:

  1. Remove all λ-productions
  2. Determine all pairs of different $A \rightarrow^+ B$
  3. Whenever there’s a derivation A → B → y, add new rule A → y.
  4. Remove all unit production rules

Chomsky normal form

When all rules have form A → BC or A → a. i.e.: the RHS is either two variables, or a terminal.

Steps:

  1. Remove all λ-productions
  2. Remove all unit production rules
  3. For every terminal a:
    1. add: variable Ca, rule Ca → a.
    2. in any rule with length RHS  ≥ 2, replace the terminal a with Ca
  4. Split all rules so that they have a maximum of 2 variables on the RHS, by adding new rules and variables. Example:
    • start with one rule, {A → BCDE}.
    • split, introduce variable X1: {A → BX1, X1 → CDE}
    • split, introduce variable X2: {A → BX1, X1 → CX2, X2 → DE}
    • no more splits needed, as every rule has max 2 variables on the RHS.

Removing useless variables

Why? It simplifies the grammar, sometimes by a ton.

A variable is:

Steps:

  1. Determine the productive variables: if A → y is a rule, and all variables in y are productive, then A is productive.
  2. Remove rules containing a non-productive variable.
  3. Determine reachable variables:
    • start symbol is always reachable
    • if A → y, and A is reachable, then so are all variables in y.
  4. Remove rules containing an ureachable variable.
  5. Any variable from the original grammar that doesn’t show up in the remaining rules is useless.

So basically, evict the unproductive and useless. Don’t quote me on that.

Erasable variables

A is erasable if you can somehow derive λ from it.

Parsing

Parsing: the search for a derivation tree for a given word.

For CFGs, parsing is possible in O(|w|3) time, where |w| is length of input word.

Bottom-up parsing (right-to-left)

Start from input word, try to construct starting variable S. Applies rules backwards.

The CYK (Cocke-Younger-Kasami) algorithm does bottom-up parsing for grammars in Chomsky normal form. It determines whether a non-empty word w is in L(G) (i.e., is accepted by the grammar).

Steps:

  1. Take grammar G in Chomsky normal form. Hopefully someone will be nice enough to give you that; if not, you’ll have to normalize it yourself.
  2. Compute sets Vu of variables from which you can derive u, where u is a contiguous subword of w.
    • If u is a letter, then Vu are the variables that derive to u.
    • If u is multiple letters, then Vu is set of all variables such that:
      • u = u1u2 with u1 and u2 being some nonempty words (potentially multiple letters)
      • A → BC is a production in the grammar, with B deriving to u1 and C deriving to u2
  3. If the starting variable is in the set of variables that derive to word w, then the grammar generates that word.

Top-down parsing (left-to-right)

Start from starting variable S, try to derive the input word.

Simple leftmost:

LL parsing:

LL: left-to-right (top-down), leftmost derivation. Backtracking not allowed.

CFG prerequisite - must have no useless variables (though λ-productions and unit productions are allowed)

I’ll try to explain this in a more understandable way than the abstract notation we get.

First set

The set of terminals that begin strings derivable from variable A.

To find First(A), you want to look at the RHS of every rule A -> XY:

Example

Take the grammar with rules:

  1. A → DbCbz
  2. A → dzzzA
  3. A → λ
  4. B → kkdb
  5. C → kzeA
  6. D → AneCB

I start with B, because it doesn’t depend on other first sets.

First(B):

First(C):

First(A):

First(D):

Remember, duplicates are excluded in sets.

Follow set

The set of possible terminals immediately following a variable A.

To find Follow(A), you want to look at rules that have A on the RHS:

Example

Take the grammar with rules:

  1. A → DbCbz
  2. A → dzzzA
  3. A → λ
  4. B → kkdb
  5. C → kzeA
  6. D → AneCB

I start with D, as it does not depend much on other rules.

Follow(D):

Follow(B):

Follow(C):

Follow(A):

Parse table construction

Once you have first and follow sets, you can construct a parse table. The rows are indexed by variables, the columns are indexed by terminals.

A cell at row A and column u contains a rule (LHS → RHS) if:

Example

Take the grammar with rules:

  1. A → DbCbz
  2. A → dzzzA
  3. A → λ
  4. B → kkdb
  5. C → kzeA
  6. D → AneCB

Rule 1:

Rule 2:

Rule 3:

Rule 4:

Rule 5:

Rule 6:

The resulting parse table:

bdnkz$
AA → λA → DbCbz
A → dzzzA
A → DbCbz
A → λ
A → λA → λ
BB → kkdb
CC → kzeA
DD → AneCBD → AneCB

This table could not yet be used for LL(1) parsing, as there are cells containing more than one rule.