Multilingual parallel corpora and linguistic theory: How to compare constraints cross-linguistically

Natalia Levshina
Parallel corpora have been used very successfully in applied and contrastive linguistics. In this talk, I want to demonstrate how they can help the linguist to answer important theoretical questions, presenting two case studies in morphosyntax and intercultural pragmatics. These studies are based on multilingual parallel corpora, i.e. those that contain translations in a large number of languages. The main advantage of parallel corpora is that they provide comparative micro-concepts (Haspelmath 2010), i.e. the contents of aligned translations in different languages. This enables us not only to compare cross-linguistically not only the verbalizations of these contents, as it is often done, but also the semantic, pragmatic and other factors that constrain the formal variation. For this purpose, I employ relatively novel multivariate methods, namely, conditional inference trees and random forests. The data come from my own parallel corpus of film subtitles and TED talks ParTy (Parallel corpus for Typology). Causative constructions: iconicity or economy? The first case study is based on film subtitles in ten diverse languages from different language families. I test the well-known universal correlation between form and function of causative constructions (e.g. Comrie 1981; Shibatani & Pardeshi 2002) and investigate diverse semantic and syntactic factors that determine the choice between lexical, morphological and analytic causatives across the languages, such as involvement of the Causer in the caused event or willingness of the Causee (cf. Dixon 2000). The results are interpreted in terms of the universal principles of iconicity and economy (e.g. Haiman 1983). I will also present additional evidence, which corroborates the conclusions based on the corpus analyses, and which is obtained from an artificial language learning experiment and comparable corpora of spoken language. T/V forms in European languages The second case study investigates the cross-linguistic differences in the constraints on the use of so-called T/V forms (e.g. French tu and vous, German du and Sie, Russian ty and vy) in ten European languages from different language families and genera. These constraints represent an elusive object of investigation because they depend on a large number of subtle contextual features and social distinctions, which should be cross-linguistically matched. I select more than two hundred contexts that contain the pronouns you and yourself in the original English versions of film subtitles, which are then coded for fifteen contextual variables that describe the Speaker and the Hearer, their relationships and different situational properties, operationalizing the parameters that have been mentioned in the literature, such as the dimensions of power and solidarity (Brown & Gilman 1960). On the basis of the translations of these situations in the film subtitles in ten languages, I identify the most relevant contextual variables that constrain the T/V variation in each language and compare these constraints across the languages.

Attachments: (video, slides, etc.)
100 MB
853 MB
127 MB
182 MB
369 MB