Formal grammars - A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered grammatical.
- Grammatical sentences of a language may be very large or infinite, therefore they are usually derived by a recursive definition.
Definition of the formal grammar G - G = < V, Σ, P, σ >
- V – set of terminal symbols
- Σ – set of nonterminal symbols with the restriction
- that V and Σ are disjoint
- σ – start symbol
- P – set of production rules in a form:
- where:
- A – is a sequence of symbols having at least one
- nonterminal,
- B – is the result of replacing some nonterminal symbol
- A with a sequence of symbols (possibly empty)
- from V and Σ
Small subset of English grammar - V = {“the”, ”a”, ”cat”, ”dog”, ”saw”, “chased“}
- = {S, NP, VP, D, N, V}
- S – sentence D – determiner
- NP – noun phrase N – noun
- VP – verb phrase V – verb
- = S
- P = {
- S –> NP VP,
- NP –> D N,
- VP –> V NP,
- D –> ”the”, D –> “a”,
- N –> ”cat”, N –> ”dog”,
- V –> “saw”, V –> “chased”
- }
Derivation - Example of a leftmost derivation:
- S –> NP VP
- –> D N VP
- –> “the” N VP
- –> “the” “cat” VP
- –> “the” “cat” V NP
- –> “the” “cat” “chased” NP
- –> “the” “cat” “chased” D N
- –> “the” “cat” “chased” “a” N
- –> “the” “cat” “chased” “a” “dog”
Parse trees Backus notation for production rules - ::= – is defined as
- | – separates alternatives
- <> – denotes nonterminal symbols
- Production rules for the small subset of English grammar
- P = {
-
::= , - ::= ,
- ::= ,
- ::= ”the” | “a”,
- ::= ”cat” | ”dog”,
- ::= “saw” | “chased”
- }
Classification of formal grammars | | | - Recognizing automaton /
- Storage required /
- Parsing complexity
| | - Regular grammars,
- Finite state grammars
| - A –> xB
- C –> y
- A, B, C – non-terminal symbols
- x, y – terminal symbols
| - Finite state automaton /
- Finite storage /
- O (n)
| | | - A –> BC…D
- A – non-terminal symbols
- BC…D – any sequence of terminal or non-terminal symbols
| - Pushdown automaton /
- Pushdown stack /
- O (n3)
| | - Context sensitive grammars
| - aAz –> aBC…Dz
- A – non-terminal symbols
- a, z – sequences of zero or more terminal or non-terminal symbols
- BC…D – any sequence of terminal or non-terminal symbols
| - Linear bounded automaton
- (non-deterministic Turing machine) /
- Tape being a linear multiple of input length /
- NP Complete
| | - Unrestricted grammars,
- General rewrite grammars
| - Allows the production rules to transform any sequence of symbols into any other sequence of symbols.
- To convert context-sensitive grammar into unrestricted grammar, replacement of any non-terminal symbol A with an empty sequence needs to be allowed.
| - Turing machine /
- Infinite tape /
- Undecidable
|
Do'stlaringiz bilan baham: |