Qualitative-Quantitative Analyses of Dutch and Afrikaans Grammar and Lexicon

Christiaan Herbst. Rudolf P. Akten Des XII. Eng Yasir I Kashgari. This article tries to answer the fundamental question whether a three-way distinction dialect- tussentaal -standard is grounded in an empirical reality.

Even though this answer may depend on the context in which the investigation is carried out and may as such also show variation in Flanders, we use data from only one location, Ieper but see Ghyselen, b for a more comprehensive study. Ieper is located in the West-Flemish dialect area cf. This is a result of the fact that the linguistic repertoire in West Flanders is, in comparison to other regions, relatively rich, since the area is known to be fairly resistant to processes of dialect shift and dialect leveling Willemyns, ; Ghyselen and Van Keymeulen, , making it a particularly interesting methodological test case.

Since no corpora are available providing a comprehensive overview of situational variation in the speech of individual speakers of Belgian Dutch, a corpus was compiled between and , comprising the speech of 10 highly educated women 6 from Ieper cf. Ghyselen, a , b , of whom five were born between and and five between and 7.

They were recorded in five speech settings: 1 a dialect test, 2 a standard language test, 3 a conversation with a friend 8 from the same city, 4 a conversation with a friend from a different dialect area, and 5 a sociolinguistic interview with an unacquainted interviewer from a different dialect area.

During the sociolinguistic interviews, data were gathered about the linguistic background of the informants and their perceptions of their own language use and language in Flanders in general. These tests were used to determine the informants' proficiency in the most acrolectal vs. The corpus is analyzed using both quantitative and qualitative methods. Quantitatively, a correlative sociolinguistic approach is adopted: the distributions of 29 phonological and morphosyntactic features are studied in the five types of data, using both correspondence analyses and cluster analyses.

These quantitative analyses are complemented with qualitative analyses of the interview data: the interview transcriptions were coded for 23 themes e. With these criteria in mind, the following linguistic variables were studied between brackets is their absolute frequency in the corpus :. In the Supplementary Material section, an overview can be found of the attested variants, along with information on their status in standard Dutch and the dialect of Ieper, and the codes used in the graphs of this paper.

We refer to Ghyselen b for an in-depth discussion of the variants and the variable selection procedure. To study how the attested variants correlate with each other and with the independent variables age, speech setting and speaker, a profile-based Correspondence Analysis CA was performed cf. Plevoets, , ; De Sutter et al. The technique allows for the detection of potential clusters of linguistic features which behave alike, for instance clusters of dialect features or clusters of Standard Dutch features, and to visualize the structural distance or the lack of a structural gap between those clusters.

A second step in the correspondence analysis is to plot the calculated distances in a two-dimensional space. For this purpose, the originally multidimensional matrices are reduced to two-dimensional ones using singular value decomposition, a dimension reduction technique which aims at preserving as much relevant information as possible. The distances from these two low-dimensional matrices are subsequently plotted in a biplot, in which the relative positions of the data points are indicative of their associations: variants plotted far away from each other are marked by low degrees of association; variants plotted close to each other show high associations.

Important in the interpretation of correspondence plots are therefore the distances between data points and the way in which these cluster; the x- and y-axes do not have predetermined interpretations cf. In this study, a profile based variant of CA was used. For more information on the advantages of this profile based approach, see Speelman et al. Another aspect in which the correspondence analyses performed in this article differ from traditional CA, is that hypothesis-testing statistics were added; the technique was hence not purely descriptive.

More specifically, confidence ellipses were drawn using bootstrap confidence interval construction Reiczigel, ; Plevoets, These ellipses are interpreted in the same way as traditional confidence intervals cf. Plevoets, : only if ellipses of two categories e. While the strategy grouping of similar categories by measuring co-variation differs from correspondence analysis projection onto a principal subspace , the results can be fairly similar: both methods are descriptive techniques which group variables based on their degree of correspondence Lebart and Mirkin, In this paper, correspondence analysis is used as main analysis technique, because, unlike cluster analysis, it not only shows correlations between linguistic variables, but also with main effects such as age and situation.

Since cluster analysis can be more convenient Lebart and Mirkin, , p. This method, which has proven relevant in several linguistic studies cf. Deumert, ; Gries, , aims at minimizing the variance within each cluster Janssens et al. We moreover use bootstrap clustering Suzuki and Shimodaira, ; Nerbonne et al. These dendrograms are subsequently compared; clusters which occur in many versions and hence have a high bootstrap probability value can be considered more reliable than clusters which were only distinguished in a limited number of dendrograms.

These probability values are of paramount importance in determining the number of relevant clusters, one of the biggest challenges in interpreting dendrograms cf. Everitt, Di Franco and Marradi, , p. While an analysis with four dimensions would be more suitable for our data from a statistical point of view eigenvalues drop from the fifth dimension onwards; cf. Baayen, , p. Four dimensions will however serve as input for the cluster analysis. Correspondence plot Ieper with main effects for situation The biplot translates different linguistic behavior into distances between data points.

Thus, variants plotted close to each other, display similar behavior; large distances between variants imply weak correlations. As was mentioned in the Methods sections, the axes of the biplot do not have a predetermined interpretation. Meaning can, however, be uncovered by searching for patterns in the data points Geeraerts, , p. The x-axis hence seems to be linked to locality left: local, right: non-local. The y-axis is more difficult to interpret. In the bottom right corner, features are plotted which are not endogenous in the dialect of Ieper according to existing dialect descriptions see e.

The y-axis hence seems to be related to the type of non-dialectal language speakers target in non-local settings: from exogenously colored tussentaal in the bottom of the plot to VRT-Dutch on top, with in the center features occurring in both or in neither. Yet such an interpretation does not explain the distances between certain variants seen in the left of the biplot e. The language use in the regional informal conversations hence differs only slightly from that in the dialect test and is fairly dialectal. This is consistent with the fact that Flemish speakers are known to feel uncomfortable about the VRT-Dutch norm cf.

This overall diaglossic pattern should not, however, be taken as an indication that all individual speakers have diaglossic repertoires at their disposal; the overall continuous pattern might result from overlapping individual diglossic repertoires. It is beyond the scope of this paper to discuss all individual repertoires separately, but analyses reported in Ghyselen a indicate idiolectal variation in repertoire structure: whereas some speakers seem to have diaglossic repertoires e.

Research with more speech settings and speech partners might reveal more clusters. In general, however, variation between repertoire types in Ieper hints at an ongoing transition from diglossia to diaglossia Ghyselen, a. In the following paragraphs, we examine whether varieties can be distinguished in Ieper's overall diaglossic repertoire by scrutinizing the variety criteria introduced earlier. Can bundles of features be distinguished which strongly correlate in their socio-situative behavior?

To analyze these clustering tendencies more deeply, the first four dimensions of the correspondence analysis were used as input for a cluster analysis. AU p -values 15 reported for every cluster can serve as guideline for the interpretation. Since cluster analysis is in essence a descriptive technique, the interpretation of a dendrogram also depends on theoretical concerns, however.

Interestingly, the distinguished clusters map nicely onto the two-dimensional correspondence plot cf. Correspondence plot Ieper with main effects for situation; the color codes indicate how the variants are categorized in a cluster analysis based on four-dimensional correspondence coordinates.

It is important to highlight that the features within one cluster show strong correlations, but that they can also be combined with features from other clusters be it with a different probability. As was already stressed above, correlation or covariance does not imply strict co-occurrence Weinreich et al. To investigate potentially routinized stylistic functions associated with variants in each of the distinguished clusters, we study when speakers use which cluster and complement these production data with qualitative interview data, as these yield more insight into the motives underlying the speech behavior.

The yellow cluster displays strong associations with both the dialect test and the regional conversations with friends for every speaker Ghyselen, a. This type of language is moreover associated with coziness and familiarity cf. Extract 1 , which indicates that the dialect cluster functions as a regional informality marker.

Ja uit dezelfde streek komen. The brown cluster also shows strong associations with both the dialect test and the regional informal conversations but contains eastern West Flemish non-standard, non-endogenous variants The difference with the variants in the yellow cluster is not only that these features are not found in traditional descriptions of the Ieper dialect, but also that they are infrequent and not used by all speakers cf.

It seems likely that these features compare stylistically to the traditional Ieper dialect. Absolute frequencies of the eastern West-Flemish non-standard, non-endogenous variants brown cluster. The cluster does not occur in the personal repertoire of all speakers cf. Extract 2. From the interviews, it appears as if speakers of cleaned-up dialect have no intention to use the standard in supraregional informal conversations, and merely adapt their language for reasons of comprehensibility.

This points toward a diaglossic language repertoire in the mind of the named speakers, who consciously realize something in between standard language and dialect. Extract 3. This language use has a double function: there is on the one hand a group of speakers who indicate using this intended standard Dutch as a lingua franca in all non-regional situations, whereas other speakers only rely on this type of language when a certain degree of formality is involved.

There is also interpersonal variation in the degree to which non-standard, non-endogenous features are integrated in substandard language use Ghyselen, a. Those non-standard, non-endogenous features constitute a separate subcluster within the green main cluster cf. Ghyselen, a. This cluster consists of standard features only, which are primarily realized in the standard language test, and to a much lesser degree in the interview setting.

On the basis of quotes such as the ones in Extract 4 and the strong association with the non-spontaneous standard test, we could label these variants as characterizing a mainly virtual standard language norm, which is associated with the media and rarely realized in everyday life. We can however not exclude that there are other settings, not studied here, in which the speakers do exploit their standard language competence to the full.

The gray cluster hence potentially functions as a professionalism marker. Der zijn mensen die perfect Algemeen Nederlands kunnen ma maar da's dat is nie niet de normale […] iedereen probeert dan Algemeen Nederlands te spreken ma vo voor mij is da dat altijd een tussentaal en… […] ok misschien Martine Tanghe die op't journaal presenteert dat die Algemeen Nederlands spreekt daar kan ik mee z…. In sum, we can say that the five clusters are not in a one-to-one relation with the five situations under study. The gray cluster—which we can dub VRT-Dutch—displays clear associations with the standard language test, but in the case of the other clusters, there is a functional overlap.

Whether these clusters also function more generally as regionality markers, and hence are also used in formal regional settings, is a question for further research. In supraregional informal conversations some speakers realize cleaned-up dialect red cluster , whereas others switch to a form of substandard green cluster , which is used by all speakers in the interviews and functions as formal supraregional language; for some speakers as supraregional language tout court. Informants report comprehensibility as the main reason to use both cleaned-up dialect and substandard.

A third matter to be discussed here is whether the distinguished clusters are marked by idiovarietary elements, i. While these are not essential for variety status cf. In the present study, it was investigated quantitatively whether specific language variants occur frequently in one type of setting only. In addition, metadata were checked for features which the informants named as typical of a certain variety, despite the fact that this issue was not brought up explicitly in the interviews. The gray cluster, VRT-Dutch, is marked by several idiovarietary elements.

Speaker Wvla3 reports that standard language should be spoken as it is written, which implies that final consonants should also be pronounced, as should initial [h]. Both the Ieper dialect yellow cluster and the horizontally leveled dialect brown cluster seem to contain several idiovarietary elements. Typical for the dialect of Ieper is for instance the suffix - en in the 1sg. These variants disappear in the supraregional conversations and interviews.

In the metadata, no statements are found concerning any of the named variants. The speakers indicate more generally that they consider dialectal vocabulary and to a lesser degree also accent as typical of the dialect, but they seldom give specific examples. Morpho-syntactic features were never mentioned cf.

WEST GERMANIC LANGUAGES: Frisian, Dutch, & Afrikaans (The Lord's Prayer)

The variants in the cleaned-up dialect cluster occur in various speech settings and hence do not seem to be of an idiovarietary nature. Variants such as expletive dat and t -deletion are also heard in the regional informal conversations and the interviews. When speakers mention the cleaned-up dialect in the interviews, no idiovarietary elements are mentioned either. The green cluster, the substandard, does seem to be marked by idiovarietary features, such as the ke -diminutive, the ne -article, the uninflected 1pl.

These features set apart the substandard from both dialect, VRT-Dutch and cleaned-up dialect. These idiovarietary features, however, do not occur in the substandard of all speakers, and are not mentioned in the interviews. Their occurrence does provide evidence for the existence of a distinct supraregional, informal variety. This question is closely related to the salience problem—why are speakers more aware of certain features than of others? These questions have however not been convincingly answered up till now cf.

Kerswill and Williams, ; Ghyselen, b , p. Finally, we also address the question whether the dataset offers evidence for emic category status of the observed clusters. Do the participants perceive the observed clusters as separate systems?

  • OCLC Classify -- an Experimental Classification Service.
  • A grammatical sketch of Upper Necaxa Totonac.
  • Cognitive and Communicative Approaches to Linguistic Analysis;
  • To answer that question, we study the metadata in the sociolinguistic interviews. In the sociolinguistic interviews, all speakers mentioned dialect and standard as two extremes of the Flemish language repertoire. These extremes are perceived to be separate systems, which is for instance clear from Extract 5. West-Vlaams is eigenlijk een aparte taal. West-Flemish is actually a separate language. Concerning the clusters between dialect and standard Dutch, there are only two speakers who sketch a purely diglossic dialect-standard language image.

    Extracts 2, 3, and 4 above. Thus, both cleaned-up dialect and substandard seem to have emic category status, although the perceptions are less uniform than those of dialect and standard language. This can be explained in linguistic terms—the intermediate clusters are less clearly defined and subject to more idiolectal variation—but also in social terms, with informants lacking the necessary metalanguage to describe the intermediate variations.

    The analyses above cf. Within the dialect variety, two speech layers can be distinguished: the traditional dialect and a form of horizontally leveled dialect. These cannot be considered separate varieties as they do not have separate emic category status and their stylistic functions seem identical. In addition to the dialect and VRT-Dutch, the substandard can be considered a separate variety as well: a bundle of features was observed which showed strong associations with relatively formal speech settings such as a sociolinguistic interview and for some speakers also with supraregional informal speech.

    Interestingly, the cluster is marked by a number of non-standard, non-endogenous features, which function as idiovarietary elements. It has to be stressed, however, that not all speakers realize these features to the same extent. It hence seems logical to distinguish two speech layers or formal types within the substandard: a type with non-standard, non-endogenous features, and a type without. This not only holds on the population level e. Cleaned-up dialect therefore represents a less prototypical instance of a variety, and illustrates our theoretical point that the variety concept is not a black-and-white notion.

    Interestingly, at some points our criteria converged as to how the variation landscape is structured, in that many speakers having a variety at their disposal that was not observed across the board e. At the beginning of this paper, we raised the question whether systems or varieties can be distinguished in the heterogeneity of everyday language. This question was shown to be relevant for many concepts in contemporary linguistics, such as code-switching vs. Taking stock of criteria traditionally used for variety status—such as homogeneity, stylistic functions, emic category status, and idiovarietary features—we argued that these form a catalog of criteria which can be tested against empirical data.

    We especially emphasized the importance of factoring in the ontological status of production patterns, as this criterion allows distinguishing categories which are not only statistically, but also cognitively real. Ensuing from the proposed multidimensional perspective on varieties is a flexibilization of the variety concept: in line with a cognitive view on categorization, whether a type of language can be considered a variety is a matter of degree, depending on the number of variety characteristics displayed by that language use.

    The variety question does not have a hard and fast, universal answer; insight into variety structure can only be achieved through close empirical scrutiny of production and perception patterns in both individual language users and a given language community. Combining quantitative and qualitative approaches, the case-study presented here focused on stylistic variation in Dutch as spoken in Ieper, in the Belgian province of West Flanders, by a relatively homogeneous and small group of test persons.

    Some tentative diachronic conclusions can be drawn from the data, too. In all likelihood, much of the variation in our data can be understood as indicative of ongoing change. At the beginning of this paper, processes of dialect leveling, dialect shift, destandardization, and demotization were discussed, and shown to yield proper predictions on the level of both production and perception.

    It is beyond the scope of this paper to discuss the age effects in the production data elaborately, but one striking result is that a clear overall age effect could only be found for the interview setting, showing stronger associations with the VRT-Dutch cluster for the older than for the younger speakers in this setting. This age effect implies standard language change, but is ambiguous as to its interpretation as an instance of destandardization or demotization. The perception data indicate that a scenario of destandardization is not very likely, however, given that all informants reproduced many aspects associated with a Standard Language Ideology in the interviews, i.

    All informants also indicated aiming at Standard Dutch in many settings. Thus, it seems likely that the variants produced more frequently by younger informants during the interview, such as t -deletion, ne -articles and expletive dat , are increasingly accepted within the Standard Dutch norm. Interestingly, no age effects were found in the dialect test setting, nor in the informal regional conversations, indicating that the Ieper region is fairly resistant to processes of dialect leveling and dialect shift at least with respect to the studied variables.

    On a theoretical level, the case of Ieper shows that even in situations in which a linguistic repertoire presents itself as a sociolinguistic continuum, it remains worthwhile to try identifying focal points, thus acknowledging that linguistic variants are organized in structures. Our conclusion does not imply that the very concept of diaglossia is superfluous, however, if only because clear differences can be seen with diglossic repertoires.


    The Ieper case also provides no principled argument against the possibility that other linguistic repertoires can indeed display a more fluid, continuum-like structure. The empirical approach adopted here has the advantage that it avoids projecting preconceived structure or even uniformity on the sociolinguistic landscape, and links linguistic variants to social categories in a bottom-up way.

    This makes the methods suitable to tap into the social meaning of variants and even map out how social categories are structured in , but also by language. The methods also take into account behavior of individual language users. Rather than zooming in on the particular, the methods adopted here allow to study the behavior of linguistic individuals while still enabling us to derive a generalization on the level of the speech community.

    Indeed, a current convergence is observed between social and cognitive sciences, which manifests itself in the rise of new fields such as social neuroscience, and, within linguistics, in new frameworks in both psycholinguistics and sociolinguistics, such as sociolinguistic cognition and cognitive sociolinguistics see De Vogelaer et al. Finally, on a methodological level, analyses like the ones in this article require data incorporating not only more social and regional variation, but also stylistic variation along other parameters than regionality and formality.

    We hope that the presented research can form an impetus toward larger-scale investigations into variety structure. This study was carried out in accordance with the recommendations of the ethical commission of the UGent Faculty of Arts and Humanities with written informed consent from all subjects.

    All subjects gave written informed consent in accordance with the Declaration of Helsinki. This article is based on an empirical investigation designed and carried out by A-SG under the supervision of GD.

    The theoretical framework adopted and the interpretations described in the paper were intensely discussed, and are shared by both authors. A-SG did the bulk of the writing and GD critically revised the manuscript. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

    The reviewer MD and handling Editor declared their shared affiliation. See Matras , p. Considering that hyperforms are to be expected in this test context more than in spontaneous speech settings or among native speakers of dialect , this is a low number. In line with the study's focus on recent developments relating to standardization, we opted for highly educated women, as 1 women have been reported to be the leaders of change Labov, and 2 a higher education is typically associated with more mobility and intenser supraregional contacts cf.

    Britain, , p. Six of the 20 conversations with friends of the same or of a different dialect area were mixed sex conversations; the majority were same sex conversations.