Language and Region Codes for the Standard Cross-Cultural Sample

New language and region codes are developed for the Standard Cross-Cultural Sample. The region codes were developed previously with a different sample and tested against social structure data. The language codes incorporate information from recent publications on language history and are presented at multiple levels, providing information about the taxonomic relationships among languages.

The most widely used cross-cultural codes for regions and language groups are those developed by Murdock (1967) for the Ethnographic Atlas. These are now more than 30 years old. Burton, Moore, Whiting, & Romney (1996) recently have proposed an improved set of regional categories, and there are several new publications in historical linguistics that can be used to formulate language codes (Campbell, 1997;Greenberg, 1987;Kaufman, 1994;Mosely & Asher, 1994;Ruhlen, 1991). Here, I have coded the 186 societies of the Standard Cross-Cultural Sample (Murdock & White, 1969) using the new regions and the various new studies of language families. The revised regional classification was developed based on concepts of precapitalist world systems. There are two world systems: the Middle Old World in North Africa and Southern Eurasia, and the system of agrarian states in the New World that extended south from Mesoamerica to the Andes. These and the other regions are described briefly below and in more detail in Burton et al. (1996). The classification was tested using Murdock's (1967Murdock's ( , 1970 social structure data and found to fit the data much better than Murdock's classification. Language classifications seem to evoke strong feelings among comparative linguists. In fact, progress on this article was delayed by the process of asking comparative linguists to comment on the language classifications. In short, they do not agree. The problem often is described as the difference between lumpers and splitters, but the difference is better described in terms of methodology because both groups have the same goal-to classify languages into larger groupings. One group (the so-called "lumpers") favors Greenberg's (1963) method of mass comparisons, a statistical approach which, however, does not use particularly powerful methods of statistical analysis. The second group (the "splitters") requires strong evidence for reconstruction of relationships between languages and rejects higher order groupings based on mass comparisons.
The greatest differences between the two approaches occur with classification of American Indian languages. Greenberg's (1987) classification of these languages into three major groups is not widely accepted by specialists on American Indian languages, whereas his earlier (Greenberg, 1963) classification of African languages still is closely followed.
Although I am sympathetic to statistical approaches and a great admirer of Greenberg's work, the lumper approach is less well suited to the purpose of this article, which is to provide information about language groupings upon which most historical linguists would agree. Information about some of the proposals for more macroscopic groupings appears at the end of the article.
A note is required here about the use of language codes to test Galton's problem. A shared language history is a plausible index of shared history, and in many (but not all) cases, language families provide useful information of a taxonomic nature. In working on Galton's problem, we used taxonomic relationships among languages as the basis for computing proximity measures among the various languages (Dow et al., 1984;White et al., 1981). We used these measures in analyses derived from spatial autocorrelation methods that we called language autocorrelation analysis. In studies of the gender division of labor, we found an autocorrelation effect that could be located within the group of societies with Bantu languages-an effect that we could detect but that was too small to affect the validity of the larger research project. As we shall see below, the Bantu languages are very closely related within a larger language family, so the main generalization from this work is that language autocorrelation is likely to occur only among very closely related languages. That would imply that information about very high-level groupings, such as Greenberg's Amerind language phylum (Greenberg, 1987), would have little relevance to testing nomothetic cross-cultural hypotheses even if they were to be more widely accepted by historical linguists.
Previous language codes for cross-cultural research have focused on a single level of multilevel taxonomies. Here, I present several levels of language codes. Doing so provides more information about known relationships among languages and would allow other researchers to replicate the kind of autocorrelation analyses described above, which require the full taxonomic information. Table 1 provides a brief description of each region and a tabulation of the number of standard sample societies that fall within the region. Table 2 presents the language codebook, with frequency counts for each language group. Following Campbell (1997), I do not use the term phylum. The codes are organized by families (the highest level groupings) and two levels of subfamilies. 1 The actual codes are in Table 3.
The three levels coded allow most language families to be divided into small groups of standard sample societies, among which further differentiation should not be necessary. However, there remain two large groupings at the second subfamily level. These fall within two language families-Niger-Congo and Austronesian. Here, I discuss each of these in more detail.

NIGER-CONGO
In some classifications (e.g., Ruhlen, 1991), Niger-Congo and Kordofanian are combined into a larger group called Niger- 66 Cross-Cultural Research / February 1999        Kordofanian. However, following Wald (1994), I have included Kordofanian as one of the subgroups of Niger-Congo, at the same level as Mande. This produces three main subfamilies that include societies from the Standard Cross-Cultural Sample-Mande, West Atlantic, and Central Niger-Congo. The latter group includes 18 societies from the Standard Cross-Cultural Sample. Figure 1 presents a simplified taxonomy of the Central Niger-Congo family, 2 listing only those groups that are necessary to distinguish among these 18 languages. Of the 18 languages, 12 are Bantu languages and 13 are in the Bantoid group. It takes seven levels of the taxonomy of Central Niger-Congo languages to get to Bantu and two more to get to Central Bantu (see Figure 1), with 10 members (more than 5% of the standard sample). The depth of this group within the taxonomy shows its relatively shallow time depth. The Bantu languages are the largest group in the sample with such a close historical relationship, so it is not surprising that they were the basis for our finding of language autocorrelation, described above.

AUSTRONESIAN 3
Austronesian (see Figure 2) includes 25 members in the Standard Cross-Cultural Sample. Four levels down, 13 of the languages are in the Oceanic branch of Eastern Malayo-Polynesian, 1. North Central (2) Azande Tallensi  2 South Central (16) a Western (2) Ashanti Fon b Eastern (14) i Lower Niger (1  which contains many Pacific Island languages. Here, the situation is similar to that with the Bantu languages, where a relatively cohesive group of people settled a large geographic area. It takes three more levels of the taxonomy to get to the four Polynesian languages.

CREOLE LANGUAGES
Taxonomic trees cannot capture the entire complexity of relationships among languages. For example, a taxonomic tree cannot accurately depict the known relationships of English, a Germanic language, with the Celtic and Romance languages; the relationships of Swahili both to Arabic and to the East African Bantu languages; or the language mixing that has occurred between some Austronesian and non-Austronesian languages in Melanesia. If language is to be used as a proxy for long-term historical relationships, one must be mindful of other kinds of processes that may affect the accuracy of taxonomic representations. For example, the Mbuti speak a Bantu language, but this does not index long-term historical linkages with other Bantu-speaking societies.
Two societies in the Standard Cross-Cultural Sample have Creole languages-the Haitians and the Saramacca. Both have strong influences from Indo-European and African languages. I have added an extra category under Indo-European to include these two. Burton (2) Maori Marquesan

RELATIONSHIPS BETWEEN LANGUAGE GROUPS AND REGIONS
In the Old World (Africa, Eurasia, and the Pacific), there is a strong relationship between regions and language families (see Table  4). This is represented visually in Figure 3 with a correspondence analysis. Here, we can see three lines of language families: (a) African language families, (b) language families of North Eurasia and Circumpolar, and (c) language families of Southeast Asia and the Pacific. The three lines are connected through the Middle Old World, the center of the Old World system. Language families of the Americas show a different pattern of correspondence between regions and language families. Table 5 tabulates the many American language families against four American regions. 4 The correspondence analysis of this table appears in Figure 4. Here, there are two main groupings, the first focussed in the west and northwest, the second including the Eastern Americas and Mesoamerica to the Andes.

PROPOSALS FOR MACROSCOPIC GROUPINGS
In addition to proposals discussed above, some scholars lump language families 13 to 16 into a single Papuan phylum. Korean and Japanese sometimes are included in a single family, which is often included in the Altaic language family. Although those proposals are open to debate, the earlier proposal for linking Uralic and Altaic has fallen into disuse.
I have included Haida in NaDene, following Campbell (1997), but this is not universally accepted. My usages of Penutian and Hokan follow Campbell's more conservative approach, and these are much smaller groupings than the larger groupings that originally were proposed by Sapir (1921).