Syntactic processing

Given a set of linguistic rules that describe how elements of a sentence can be put together, a program called a parser will try to find the best grammatical analysis of a sentence. if it’s ambiguous it will produce all analyses.

“I can fish”

Do you know how to fish or do you habitually put fish in cans?

Context-free grammars

We can describe the structure of sentences like I can fish as a system of rules, called “toy” grammar. They are kind of awkward and only represent a tiny fragment of language. 

To the left of the arrow in the right hand column are lexical categories or parts of speech. They map to particular words. All the categories to the left of the arrow (in either column) are called non-terminals since they rewrite to other categories or words. Words don’t rewrite to anything and are called terminals. 

This only covers a little bit but it can describe quite a variety of sentences. Grammars written like this are context-free. 

A real grammar would need to be extended to add adjectives and adverbs and to handle syntactic structures like questions, imperatives and relative clauses, as well as subject-verb agreement as in He cans tuna


A program which analyzes the syntactic structure of a sentence is a parser, it takes an input sentence and produces one or more syntactic representations of it.

Here’s how it works for the sentence I can fish. 

It starts with one rule which is basically, assume sentence S. When you apply the next rule, that a sentence gives you a noun phrase and a verb phrase, you get the next parse tree. 

Here’s what that tree looks like. But the parser tree has to expand the left-most nonterminal node on the right hand side of the rule first. So, it will chose NP, which is why we have it selected with a circle. 

So, our grammar says that we can expand or specify what NP contains.

Expanding the first tree would not be possible, because according to NP → D N rule, the first nonterminal would be a determiner which expands to the which doesn’t match the start of the input. 

The second tree has a similar problem in that we have NP → N rule predicts a noun, but the only nouns in the grammar (fish and dance) dont match the first word of the input sentence, which is I. This leads the parser to check the third NP expansion rule which is NP → Pronoun, which checks out to the word I.

Now, I is a terminal category since it’s word so we have to move onto the VP node and use our grammar as to expand it as far as we can. We’re not going to do it because that would take a long time for me to explain, but a computer can run through this in a fraction of a second. 

Statistical processing

So far we discussed linguistic rules that were designed by hand, but we can train a machine to discover these rules from examples, kind of what humans do when they are first learning languages as babies. 

We take a collection of texts called a corpus and analyze or mark up the sentences with parse trees. Once a large corpus has been annotated, which makes a treebank, we can train computers to induce grammar from it. 


First we count how often each type of syntactic configuration is found in the treebank, then we weight it according to the frequency of its occurrence. These weighted rules would be used in statistical parsing. 

If you’re curious, you should learn more about the Chomsky hierarchy: it turns out that languages are composed of structures of varying complexity, like computers.

Anything modeled by regular expressions can be modeled by context-free grammars. However, center-embedded sentences like The little puppy the boy the principal hated loved ran away can’t be modeled by regular expressions but by context-free grammars. 

In turn, we have phenomena that can’t be accounted for in context-free grammar but by context-sensitive grammar. They can model everything a context-free grammar can express and more. 

But even those can’t represent formal patterns which another class of grammars, named unrestricted, can express. It turns out that for every level of grammatical expressiveness in the Chomsky hierarchy there is an analogous computing device, as shown in the figure. 


If you liked this post and want to learn more, we highly recommend you read Contemporary Linguistics: An Introduction by William O’Grady. It’s a great textbook on linguistics in general and would make a great addition to any scientists library.


Leave a comment

%d bloggers like this: