Parsing in traditional English
Parsing is the process of analyzing the sentence for its structure, content and meaning, i.e. to uncover the structure, articulate the constituents and the relation between the constituents of the input sentence. This paper briefly describes the parsing techniques in natural language processing. Parsing is the prime task in processing of natural language, as it forms the basis for all the natural language applications, like machine translation, question answering and information retrieval. We have discussed the top-down, bottom- up and the basic top-down parsing along with their issues and a brief review of the statistical and dependency parsing.
The basic connection between a sentence and the grammar it derives from is the parse tree, which describes how the grammar was used to produce the sentence. For the reconstruction of this connection we need a parsing technique .Natural Language processing provides us with two basic parsing techniques viz; Top-Down and Bottom-Up. Their name describes the direction in which parsing process advances. We have a Basic-Top-Down parsing which is the fusion of top-down and bottom-up parsing.
“Using Top-Down technique, parser searches for a parse tree by trying to build from the root node S down to the leaves. The algorithm starts by assuming the input can be derived by the designated start symbol S. The next step is to find the tops of all the trees which can start with S, by looking on the grammar rules with S on left hand side, all the possible trees are generated .Top down parsing is a goal directed search.”3
It tries to imitate the original production process by rederiving the sentence from the start symbol ,and the production tree is reconstructed from the top downwards .Top-Down parsing can be viewed as an expansion process which begins with starting symbol S, and then advances by replacing S with the Left hand side production. The common search strategy implemented in this approach is Top-Down, left-to-right and backtracking. The search starts from the root node labeled S i.e. starting symbol, construct the child nodes by applying the rules with left hand side equals to S, further expands the internal nodes using next productions with left hand side equals to internal node, if nonterminal, and continues until leaves are Part-of-speech (terminals).If the leaf nodes i.e. Part-of-speech do not matches the input string, we need to backtrack to the latest node processed and apply another production. Top-Down parsing is viewed as generation of parse tree in preorder.
The advantage of Top-Down strategy is that it never wastes time exploring trees that cannot result in S, means it also never explores subtrees that cannot find a place in some S-rooted tree.Considering the other side of this approach, it has its own demerits, it leads to backtracking. The Top-Down approach spends considerable effort and time on S trees that are not consistent with the input. This weakness in Top-Down parser arises from the fact that they can generate trees before examining the input[8][6].While expanding the nonterminals it becomes difficult to decide which Right hand side production should be selected i.e. to select the appropriate starting production and further productions to avoid backtracking.Predictive parsing is the solution for backtracking problem faced in top-Down Strategy. Predictive Parsing is characterized by its ability to use at most next (k) tokens to select which production to apply, referred to as lookahead .Making the right decision without backtracking .Basic
idea is given A a & A b, the parser should be able to choose between “a” and “b”. To make the correct choice it needs First(a) sets and Follow(A) sets. First(a) sets describes the set of tokens that appears as the first symbol in some string that derives from “a”. Follow(A) is the set of tokens that appears immediately to the right of A in some sentential form. Predictive parsing imposes restriction on the grammar to be used i.e., the grammar must possess LL(1) property, in which the first ‘L’states that we scan the input from left to right, second ‘L’ says we create leftmost derivation first and ‘1’ means one input symbol of lookahead[1].Grammar should not be left recursive.LL(1) property Stated as follows:
If A a and A b both appears in grammar, then
First(a) ∩First(b) = Ø. This would allow the parser to make a correct choice with a lookahead of exactly one symbol.
2.2 Bottom –Up Parsing
Bottom – Up parsing starts with the words of input and tries to build trees from the words up, again by applying rules from the grammar one at a time. The parse is successful if the parse succeeds in building a tree rooted in the start symbol S that covers all of the input Bottom up parsing is a data directed search. It tries to roll back the production process and to reduce the sentence back to the start symbol S.It reduces the string of tokens to the starting Symbol by inverting the production, the string is recognized by constructing the rightmost derivation in reverse. The objective of reaching the starting symbol S is achieved by series of reductions, when the Right hand side of some rule matches the substring of the input string, the substring is replaced with the left hand side of the matched production, the process continues until starting symbol is reached, henceforth Bottom –up parsing can be defined as reduction process. Bottom-Up parsing can be viewed as generation of parse tree in postorder.
To have an operational shift-reduce parser and to determine the reducing production to be used, it implements LR parsing which uses the LR(K) grammar. Where (1) L signifies Left-to-right scanning of input (2) R indicates rightmost derivation done in reverse and (3) K, is the number of lookahead symbols used to make parsing decision. The efficiency of Bottom-Up parsing lies in the fact that, it never explores trees inconsistent with the input. Bottom-Up parsing never suggests trees that are not locally grounded in the actual input. However the trees have no hope of leading to an S or fitting in with any of its neighbors, are generated in abandon[1],which adds to the inefficiency of the bottom-up strategy.
2.3 Basic Top-Down Parsing
Neither of the approaches discuss above completely exploits the constraints presented by the grammar and the input words, therefore another technique was designed by combining the best features of Top-Down and Bottom-Up parsing, termed as Basic Top-Down parsing. The primary control strategy of Top-Down parsing is adopted to generate trees and then the constraints from the Bottom-up parsing are grafted to filter out the inconsistent parses. The parsing algorithm initiates with top-down, depth-first, left-to-right strategy, and maintain an agenda of search states, consisting of partial trees along with pointer to the next input word in the sentence. The parser takes the front state of the agenda and applies the grammar rules to the left-most unexpanded node of the tree associated with that state to produce a new set of states, and then add these new states to the front of the agenda, according to the textual order of the grammar rules that were applied to generate those states, continuing the process until either a successful parse tree is found or agenda is exhausted .Next step is to add the Bottom-up filter using left -corner rule, stated as, the parser should not consider any grammar rule if the current input cannot serve as the first word along the left edge of the derivation from that rule[7].Even though Basic Top-Down parser merges the best features of top-Down and Bottom-up strategy, yet it provides an insufficient solution to general purpose parsing problems viz: Left recursion, ambiguity and inefficient reparsing of subtrees.
The fundamental notion of dependency is based on the idea that the syntactic structure of a sentence consists of binary asymmetrical relations between the words of the sentence and Dependency parsing provide a syntactic representation that encodes functional relationships between words .The dependency relation holds between the head and the dependent. Dependency parsing uses the dependency structure representing head-dependent relations (directed arcs), functional categories (arc labels) and possibly some structural categories (parts-of-speech).
Sentence analysis
In analyzing a simple sentence, we first divide it into the complete subject and the complete predicate. Then we point out the simple subject with its modifiers and the simple predicate with its modifiers and complement (if there is one). If either the subject or the predicate is compound, we mention the simple subjects or predicates that are joined.
The polar bear lives in the Arctic regions.
This is a simple sentence. The complete subject is the polar bear. The complete predicate is lives in the Arctic regions. The simple subject is the noun bear. The simple predicate is the verb lives. Bear is modified by the adjectives the and polar. LIVES is modified by the adverbial phrase in the Arctic regions. This phrase consists of the preposition in; its object, the noun regions and the adjectives the and Arctic, modifying regions.
The polar bear and the walrus live and thrive in the Arctic regions.
The complete subject is the polar bear and the walrus. Two simple subjects (bear and walrus) are joined by the conjunction and to make a compound subject, and two simple predicates (live and thrive) are joined by and to make a compound predicate. Live and thrive are both modified by the adverbial phrase in the Arctic regions.
Analysis of Compound Sentences
In analyzing a compound sentence we first divide it into its coördinate clauses, and then analyze each clause by itself. Thus…..
The polar bear lives in the Arctic regions, but it sometimes reaches temperate latitudes.
This is a compound sentence consisting of two coördinate clauses joined by the conjunction but (1) the polar bear lives in the Arctic regions and (2) it sometimes reaches temperate latitudes. The complete subject of the first clause is the polar bear. The subject of the second clause is it; the complete predicate is sometimes reaches temperate latitudes. The simple predicate is reaches, which is modified by the adverb sometimes and is completed by the direct object latitudes. The complement latitudes is modified by the adjective temperate.
Analysis of Complex Sentences
In analyzing a complex sentence, we first divide it into the main clause and the subordinate clause.
1. The polar bear, which lives in the Arctic regions, sometimes reaches temperate latitudes.
“This is a complex sentence. The main clause is the polar bear sometimes reaches temperate latitudes; the subordinate clause is which lives in the Arctic regions. The complete subject of the sentence is the polar bear, which lives in the Arctic regions; the complete predicate is sometimes reaches temperate latitudes. The simple subject is bear, which is modified by the adjectives the and polar and by the adjective clause which lives in the Arctic regions. The simple predicate is reaches, which is modified by the adverb sometimes and completed by the direct object latitudes. This complement, latitudes, is modified by the adjective temperate. The subordinate clause is introduced by the relative pronoun which. [Then analyze the subordinate clause.”4
2. The polar bear reaches temperate latitudes when the ice drifts southward.
This is a complex sentence. The main clause is the polar bear reaches temperate latitudes; the subordinate clause is when the ice drifts southward. The complete subject of the sentence is the polar bear; the complete predicate is reaches temperate latitudes when the ice drifts southward. The simple subject is bear, which is modified by the adjectives the and polar. The simple predicate is reaches, which is modified by the adverbial clause when the ice drifts southward, and completed by the noun latitudes (the direct object of reaches). The complement latitudes is modified by the adjective temperate. The subordinate clause is introduced by the relative adverb when. [Then analyze the subordinate clause.
3. The polar bear, which lives in the Arctic regions when it is at home, sometimes reaches temperate latitudes.
This is a complex sentence. The main clause is the polar bear sometimes reaches temperate latitudes; the subordinate clause is which lives in the Arctic regions when it is at home, which is complex, since it contains the adverbial clause when it is at home, modifying the verb lives.
4. He says that the polar bear lives in the Arctic regions.
This is a complex sentence. The main clause is he says; the subordinate clause is that the polar bear lives in the Arctic regions. The subject of the sentence is he, the complete predicate is says that the polar bear lives in the Arctic regions. The simple predicate is says, which is completed by its direct object, the noun clause that ... regions, introduced by the conjunction that. [Then analyze the subordinate clause.
5. That the polar bear sometimes reaches temperate latitudes is a familiar fact.
This is a complex sentence. The main clause (is a familiar fact) appears as a predicate only, since the subordinate clause (that the polar bear sometimes reaches temperate latitudes) is a noun clause used as the complete subject of the sentence. The simple predicate is is, which is completed by the predicate nominative fact. This complement is modified by the adjectives a and familiar. The subordinate clause, which is used as the complete subject, is introduced by the conjunction that. [Then analyze this clause.]
Analysis of Compound Complex Sentences
In analyzing a compound complex sentence, we first divide it into the independent clauses (simple or complex) of which it consists and then analyze each of these as if it were a sentence by itself.
A sentence is typically associated with a clause and a clause can be either a clause simplex or a clause complex. A clause is a clause simplex if it represents a single process going on through time, and it is a clause complex if it represents a logical relation between two or more processes and is thus composed of two or more clause simplexes.
A clause (simplex) typically contains a predication structure with a subject noun phrase and a finite verb. Although the subject is usually a noun phrase, other kinds of phrases (such as gerund phrases) work as well, and some languages allow subjects to be omitted. In the examples below, the subject of the outmost clause simplex is in italics and the subject of boiling is in square brackets. Notice that there is clause embedding in the second and third examples.
[Water] boils at 100 degrees Celsius.
It is quite interesting that [water] boils at 100 degrees Celsius.
The fact that [water] boils at 100 degrees Celsius is quite interesting.
There are two types of clauses: independent and non-independent/interdependent. An independent clause realises a speech act such as a statement, a question, a command or an offer. A non-independent clause does not realise any act. A non-independent clause (simplex or complex) is usually logically related to other non-independent clauses. Together they usually constitute a single independent clause (complex). For that reason, non-independent clauses are also called interdependent.
For instance, the non-independent clause because I have no friends is related to the non-independent clause I don't go out in I don't go out, because I have no friends. The whole clause complex is independent because it realises a statement. What is stated is the causal nexus between having no friend and not going out. When such a statement is acted out, the fact that the speaker doesn't go out is already established, therefore it cannot be stated. What is still open and under negotiation is the reason for that fact. The causal nexus is represented by the independent clause complex and not by the two interdependent clause simplexes.
See also copula for the consequences of the verb to be on the theory of sentence structure.
Sentences can also be classified based on the speech act which they perform. For instance, English sentence types can be described as follows:
A declarative sentence typically makes an assertion or statement: "You are my friend."
An interrogative sentence typically raises a question: "Are you my friend?"
An imperative sentence typically makes a command: "Be my friend!"
An exclamative sentence, sometimes called an exclamatory sentence, typically expresses an exclamation: "What a good friend you are!"
The form (declarative, interrogative, imperative, or exclamative) and meaning (statement, question, command, or exclamation) of a sentence usually match, but not always. For instance, the interrogative sentence "Can you pass me the salt?" is not intended to express a question but rather to express a command. Likewise, the interrogative sentence "Can't you do anything right?" is not intended to express a question on the listener's ability, but rather to express a statement on the listener's lack of ability; see rhetorical question.
A major sentence is a regular sentence; it has a subject and a predicate, e.g. "I have a ball." In this sentence, one can change the persons, e.g. "We have a ball." However, a minor sentence is an irregular type of sentence that does not contain a main clause, e.g. "Mary!", "Precisely so.", "Next Tuesday evening after it gets dark." Other examples of minor sentences are headings (e.g. the heading of this entry), stereotyped expressions ("Hello!"), emotional expressions ("Wow!"), proverbs, etc. These can also include nominal sentences like "The more, the merrier." These mostly omit a main verb for the sake of conciseness but may also do so in order to intensify the meaning around the nouns.
Sentences that comprise a single word are called word sentences, and the words themselves sentence words. The 1980s saw a renewed surge in interest in sentence length, primarily in relation to "other syntactic phenomena".
“One definition of the average sentence length of a prose passage is the ratio of the number of words to the number of sentences. [unreliable source?] The textbook Mathematical linguistics, by András Kornai, suggests that in "journalistic prose the median sentence length is above 15 words". The average length of a sentence generally serves as a measure of sentence difficulty or complexity. In general, as the average sentence length increases, the complexity of the sentences also increases. “5
Another definition of "sentence length" is the number of clauses in the sentence, whereas the "clause length" is the number of phones in the clause.
Research by Erik Schils and Pieter de Haan by sampling five texts showed that two adjacent sentences are more likely to have similar lengths than two non-adjacent sentences, and almost certainly have a similar length when in a work of fiction. This countered the theory that "authors may aim at an alternation of long and short sentences". Sentence length, as well as word difficulty, are both factors in the readability of a sentence; however, other factors, such as the presence of conjunctions, have been said to "facilitate comprehension considerably".
Do'stlaringiz bilan baham: |