About ECCE project

The Evolution of Consonant Clusters in English (ECCE) database represents the largest diachronically layered collection of English word-final consonant clusters such as /nd/ in hand – phonologically speaking highly marked structures which are nevertheless abundant in English. The data are derived from two corpora – the Penn-Helsinki Parsed Corpus of Middle English and the Penn-Helsinki Parsed Corpus of Early Modern English, and range from the 12th century to the beginning of the 17th century, thus spanning a period of more than 500 years.

The database is enriched with a host of morphological, phonological, and lexical features. Every single cluster token in the database is annotated with the following information:

Morphonotactic status: is the cluster token spanning a morpheme boundary or lexical?
Potential suffixes involved in the establishment of the cluster token
IPA phonological transcription
Information about respective places of articulation, manners of articulation, sonority, and voicing of the segments involved
Net-auditory-distance (NAD) between the segments involved
Information about the subsequent phonological context
Written word form of the word token the cluster occurs in
Graphemic classification of the cluster token
Part-of-speech of the word token the cluster occurs in
Lemma of the word token the cluster occurs in

In addition to information about the source texts derived from the Penn Helsinki Corpora (such as approximate date, genre, manuscript and region), the database provides probabilistically derived weights for each individual cluster token. This is necessary, since it is difficult to infer the correct phonological information from the written source data that is available to the researcher. These weights were computed by means of models of linguistic spread and will help to provide more reliable estimates of token-frequencies of consonant clusters.

Thus, the database facilitates to tackle a variety of questions about English coda phonotactics and their interaction with morphology, morphosyntax, graphemics and the lexicon, as well as cluster-specific frequency developments in terms of type and token frequencies.

For more information about the ECCE project including a list of publications based on the database, please visit: ecce.univie.ac.at

For citing the ECCE database please use:

Ritt, Nikolaus; Prömer, Christina; Baumann, Andreas. 2017. Evolution of Consonant Clusters in English (ECCE): a diachronic phonotactic database. Department of English and American Studies, University of Vienna. Online publication, version X.X (ecce.acdh.oeaw.ac.at).