THC Science

Joseph H. Greenberg (May 28, 1915-May 7, 2001) was a prominent but controversial linguist, known for his work in both language classification and typology. He was born in Brooklyn, New York and served for many years on the faculty of Stanford University.

Contributions to linguistics

Language typology

Greenberg's fame rests in part on his seminal contributions to synchronic linguistics and the quest to identify linguistic universals. In the late 1950's, Greenberg began to examine corpora of languages covering a wide geographic and genetic distribution. He located a number of interesting potential universals, as well as many strong cross-linguistic tendencies.

In particular, Greenberg invented the notion of "implicational universal", which takes the form "if a language has structure X, then it must also have structure Y." For example, X might be "mid front rounded vowels" and Y "high front rounded vowels" (for terminology see phonetics). This kind of research was picked up by many other scholars following Greenberg's example and has continued to be an important kind of data-gathering in synchronic linguistics.

African languages

Greenberg is also widely known and respected for his development of a new classification system for African languages, which he published in 1963. The classification was for a time considered very bold and speculative, especially in his proposal of a Nilo-Saharan language family, but is now generally accepted among African historical specialists. In the course of this work, Greenberg coined the term Afroasiatic, to replace the former "Hamito-Semitic".

Languages of the Americas

Later, Greenberg studied the native languages of the Americas, which until then had been classified into hundreds of separate language families. He proposed a broader classification into three major groups: Eskimo-Aleut, Na-Dene, and Amerind. This proposal has generated mixed reactions from other linguists.

Languages of the World

Greenberg's best known and most controversial work is an ambitious classification system covering most languages of the world. In this grand scham, he proposed to group many language families of Europe and Asia into a single group called Eurasiatic.

Neither the American Indian classification, nor the Eurasiatic proposal have been accepted by the research community of historical linguists, and the proposals have has been strongly rejected by many of them. The main reason for the lack of acceptance of Greenberg's proposals is his methodology. However, his work, particularly his proposed Amerind language family, has also been severely criticized for the errors in the data. Quite a few language experts who have examined the data in Language in the Americas (Adelaar 1989, Berman 1992, Chafe 1987, Goddard 1987, Kimball 1992, Poser 1992, Rankin 1992) have reported large numbers of errors. Many of them report errors in the majority of the items that they examined. Some even report that every form examined is erroneous. The errors include:

erroneous forms
spurious (non-existent) forms
erroneous glosses (translations)
words attributed to the wrong language
unjustifiable or demonstrably erroneous morphological analyses (identification of prefixes, suffixes, and roots)

Some errors may only have a randomizing effect, but many of these errors, especially the morphological misanalyses, create spurious similarities and thus bias the data.

Greenberg's method of mass comparison

Traditional language comparison

Since the development of comparative linguistics in the 19th century, a linguist who claims that two languages are related, in the absence of historical evidence, is expected to back up that claim by presenting general rules that describe the differences between their lexicons, morphologies, and grammars. The procedure is described in detail in the Wikipedia article comparative method.

For instance, one could prove that Spanish is related to Italian by showing that many words of the former can be mapped to corresponding words of the latter by a relatively small set of replacement rules — such as change initial es- by s-, final -os by -i, etc. Many similar correspondences exist between the grammars of the two languages. Since those systematic correspondences are extremely unlikely to be random coincidences, the only possible explanation is that the two languages have evolved from a single ancestral tongue (Latin, in this case) All pre-historical language groupings that are widely accepted today — such as the Indo-European, Sino-Tibetan, and Bantu families — have been proved in this way.

Limitations of the comparative method

However, besides systematic changes, languages are also subject to random mutations (such as borrowings from other languages, irregular inflections, compounding, and abbreviation) that affect one word at a time, or small subsets of words. For example, Spanish perro, which does not come from Latin, cannot be rule-mapped to its Italian equivalent cane.

As those sporadic changes accumulate, they will increasingly obscure the systematic ones — just as enough dirt and scratches on a photograph will eventually make the face unrecognizable. Given the rate at which those random mutations occur, they are expected to obliterate any systematic similarities between languages that have split off more than 10,000 years ago. Considering that humans probably have been speaking fully developed languages since at least 60,000 years ago (when Australia got populated), it is hardly surprising that many languages and language families still have no known relationship with other groups.

Mass lexicon comparisons

In an effort to extend comparative linguistics beyond this limit, and arrive at his broad super-family groupings, Greenberg invented a new statistical method, mass lexical comparison. Instead of trying to uncover systematic rules, Greenberg simply compared a large sample of words from one language with its equivalents in the other language, looking for similar sound patterns. Thus, for example, Spanish cabeza and Italian capo are similar to the extent that both cantain the same consonant sound [k], similar vowel sounds [a], and similar consonants [b], [p], in sequence.

Departing from the traditional criterion, Greenberg did not look for any systematic trend in these similarities, trusting that a sufficiently large percentage $S$ of sufficiently similar pairs among the samples would be enough to prove a common origin for the two languages. This assumption is valid in principle, because $S$ is expected to be higher for languages that have split off more recently, and decrease as the split recedes into the past. The difficult part is deciding what constitutes "sufficient" similarity.

Choosing the sample lexicon

Ideally, the sample lexicons should contain only words that are likely to have survived in either language since the time of their hypothetical common origin, and are unlikely to be replaced by borrowed or reinvented words. For studies that extend more than 5000 years into the past, that criterion leaves only a few hundred concepts — such as body parts, close family relations, common animals and plants, water, fire, sky, stone, spear, etc..

Words for "modern" concepts — such as "wine", "horse", and "steel" — may show spurious similarities between unrelated languages, due to the name being imported by a culture together with the thing; e.g. Spanish pan and Japanese pan ("bread"). Alternatively, the names of recently imported concepts may get invented separately in related languages, such as computadora ("computer") in Spanish and ordinateur in French. Either way, such words would only add noise and bias to the comparison.

Weaknesses of the method

In theory, the reliability of Greenberg's method could be settled by statistical analysis; namely, by computing the probability that a given similarity level $S$ could have arisen by chance coincidences between totally unrelated languages. Unfortunately, this computation is not easily done, because it requires a fairly precise stochastic model of what would be a "random" language.

In particular, the similarity level $S$ is expected to depend on the phonetic repertoires of the two languages. Thus, one expects more chance resemblances between two languages that have few vowels and many consonants, than between a vowel-rich and a vowel-poor language. Thus, the model for a "random lexicon" must be parametrized by letter frequencies and other similar statistcs.

Also, the "ancient" concepts that are most suitable for inclusion in the sample lexicons often have onomatopoeic names that imitate a natural sound associated with the concept. (English is especially rich in such words, e.g. crack, slap, bang, crow, gurgle, cough, etc..) Such words may create similarities between corresponding words thatare not due to common origin but contribute to the similarity measure $S$ .

These difficulties are compounded by the fact that most historical linguists are unfamiliar with statistical analysis, and therefore are at a disadvantage when it comes to evaluate or criticize these comparisons. For all these reasons, most historical linguists flatly reject Greenberg's method (and the classifications implied by it), and still view the comparative method as the only legitimate way to establish pre-historical common ancestry for languages.

References

Adelaar, Willem F. H. (1989) Review of Language in the Americas. Lingua 78.249-255.
Berman, Howard (1992) A Comment on the Yurok and Kalapuya Data in Greenberg's Language in the Americas, International Journal of American Linguistics 58.2.230-233.
Chafe, Wallace (1987) Review of Language in the Americas. Current Anthropology 28.652-653.
Goddard, Ives (1987) Review of Language in the Americas. Current Anthropology 28.656-657.
Greenberg, Joseph H.Linguistics, anthropological theory, cultural anthropology; Africa.
Greenberg, Joseph H. (1963) Some universals of grammar with particular reference to the order of meaningful elements. In Universals of Language. Cambridge: MIT Press. pp. 73–113.
Greenberg, Joseph H. (1987) Language in the Americas. Stanford: Stanford University Press.
Greenberg, Joseph H. (2000) Indo-European and its Closest Relatives: the Eurasiatic Language Family – Volume I, Grammar. Stanford: Stanford University Press.
Greenberg, Joseph H. (2002) Indo-European and its Closest Relatives: the Eurasiatic Language Family – Volume II, Lexicon. Stanford: Stanford University Press.
Kimball, Geoffrey (1992) A critique of Muskogean, `Gulf,' and Yukian materials in Language in the Americas, International Journal of American Linguistics 58: 447-501.
Poser, William J. (1992) The Salinan and Yurumanguí Data in Language in the Americas. International Journal of American Linguistics 58.2.202-229.
Rankin, Robert (1992) Review of Language in the Americas, International Journal of American Linguistics 58.3.324-351.

External links

@@ Line 17: / Line 17: @@
 Greenberg's best known and most controversial work is an ambitious classification system covering most languages of the world.  In this grand scham, he proposed to group many language families of Europe and Asia into a single group called [[Eurasiatic languages|Eurasiatic]].
-Neither the American Indian classification, nor the Eurasiatic proposal have been accepted by the research community of historical linguists, and the proposals have has been strongly rejected by many of them.  The main criticism is not the classification per se but the novel method used by Greenberg to establish the relationship between languages, which the critics claim is unreliable.
+Neither the American Indian classification, nor the Eurasiatic proposal have been accepted by the research community of historical linguists, and the proposals have has been strongly rejected by many of them. The main reason for the lack of acceptance of Greenberg's proposals is his methodology.
+However, his work, particularly his proposed [[Amerind language family]],
+has also been severely criticized for the errors in the data.
+Quite a few language experts who have examined the data in Language in the Americas
+(Adelaar 1989, Berman 1992, Chafe 1987, Goddard 1987, Kimball 1992, Poser 1992, Rankin 1992)
+have reported large numbers of errors. Many of them report errors in the majority
+of the items that they examined. Some even report that every form examined is erroneous.
+The errors include:
+*erroneous forms
+*spurious (non-existent) forms
+*erroneous glosses (translations)
+*words attributed to the wrong language
+*unjustifiable or demonstrably erroneous morphological analyses (identification of prefixes, suffixes, and roots)
+Some errors may only have a randomizing effect, but many of these errors, especially the morphological misanalyses, create spurious similarities and thus bias the data.
 ==Greenberg's method of mass comparison==
@@ Line 50: / Line 67: @@
 These difficulties are compounded by the fact that most historical linguists are unfamiliar with statistical analysis, and therefore are at a disadvantage when it comes to evaluate or criticize these comparisons.  For all these reasons, most historical linguists flatly reject Greenberg's method (and the classifications implied by it), and still view the [[comparative method]] as the only legitimate way to establish pre-historical common ancestry for languages.
-==Books==
+==References==
+* Adelaar, Willem F. H. (1989) Review of Language in the Americas. Lingua 78.249-255.
-* Joseph H. Greenberg, ''Linguistics, anthropological theory, cultural anthropology; Africa''.
+* Berman, Howard (1992) ''A Comment on the Yurok and Kalapuya Data in Greenberg's Language in the Americas'', International Journal of American Linguistics 58.2.230-233.
-* Joseph H. Greenberg, ''Some universals of grammar with particular reference to the order of meaningful elements''. In ''Universals of Language'', p. 73&ndash;113. MIT Press (1963).
-* Joseph H. Greenberg, ''Language in the Americas''. Stanford University Press. (1987).
+* Chafe, Wallace (1987) ''Review of Language in the Americas. Current Anthropology 28.652-653.
+* Goddard, Ives (1987)  ''Review of Language in the Americas''. Current Anthropology 28.656-657.
-* Joseph H. Greenberg, ''Indo-European and its Closest Relatives: the Eurasiatic Language Family''. Stanford University Press. (2000).
+* Greenberg, Joseph H.''Linguistics, anthropological theory, cultural anthropology; Africa''.
+* Greenberg, Joseph H. (1963) ''Some universals of grammar with particular reference to the order of meaningful elements''. In ''Universals of Language''. Cambridge: MIT Press. pp. 73&ndash;113.
+* Greenberg, Joseph H. (1987) ''Language in the Americas''. Stanford: Stanford University Press.
+* Greenberg, Joseph H. (2000) ''Indo-European and its Closest Relatives: the Eurasiatic Language Family &ndash; Volume I, Grammar''. Stanford: Stanford University Press.
+* Greenberg, Joseph H. (2002) ''Indo-European and its Closest Relatives: the Eurasiatic Language Family &ndash; Volume II, Lexicon''. Stanford: Stanford University Press.
+* Kimball, Geoffrey (1992) ''A critique of Muskogean, `Gulf,' and Yukian materials in Language in the Americas'', International Journal of American Linguistics 58: 447-501.
+* Poser, William J. (1992) ''The Salinan and Yurumangu&iacute; Data in Language in the Americas. International Journal of American Linguistics 58.2.202-229.
+* Rankin, Robert (1992) ''Review of Language in the Americas'', International Journal of American Linguistics 58.3.324-351.
 ==See also==