Archive

Posts Tagged ‘Parsing’

Sub Problems of NLP

September 4th, 2010 1 comment
  • Speech segmentation

In most spoken languages, the sounds representing successive letters blend into each other, so the conversion of the analog signal to discrete characters can be a very difficult process. Also, in natural speech there are hardly any pauses between successive words; the location of those boundaries usually must take into account grammatical and semantic constraints, as well as the context.

  • Text segmentation

Some written languages like Chinese, Japanese and Thai do not have single-word boundaries either, so any significant text parsing usually requires the identification of word boundaries, which is often a non-trivial task.

  • Part-of-speech tagging
  • Word sense disambiguation

Many words have more than one meaning; we have to select the meaning which makes the most sense in context.

  • Syntactic ambiguity

The grammar for natural languages is ambiguous, i.e. there are often multiple possible parse trees for a given sentence. Choosing the most appropriate one usually requires semantic and contextual information. Specific problem components of syntactic ambiguity include sentence boundary disambiguation.

  • Imperfect or irregular input

Foreign or regional accents and vocal impediments in speech, typing or grammatical errors, OCR errors in texts.

  • Speech acts and plans

A sentence can often be considered an action by the speaker. The sentence structure alone may not contain enough information to define this action. For instance, a question is sometimes the speaker requesting some sort of response from the listener. The desired response may be verbal, physical, or some combination. For example, “Can you pass the class?” is a request for a simple yes-or-no answer, while “Can you pass the salt?” is requesting a physical action to be performed. It is not appropriate to respond with “Yes, I can pass the salt,” without the accompanying action (although “No” or “I can’t reach the salt” would explain a lack of action).