Putting Linguistics Back Into Computational Linguistics
Speaker:
Martin Kay
Abstract:
The belief has recently become widespread that
the properties of language needed to process it for useful
purposes will emerge if sufficiently large quantities of raw
text and speech are analyzed automatically using sufficiently
sophisticated techniques. The kind of understanding that a
linguist attempts to achieve by examining individual
specimens at close range has little value, at least for
practical purposes. But, if information can be caused to
emerge from the raw data only if it is in there in the first
place, and it has long been known that this is not the case. A
language is a code, that is, a system of arbitrary relations
between symbols and things in worlds, real and imaginary.
No time or effort invested in examining the symbols will
reveal these relations to one who does not know the code. If
this is true, then we must ask why statistically based
machine translation, for example, has come as far as it has,
and how much further it can expect to go.