The Project

Understanding The Problem

The field of Corpus Linguistics is rapidly evolving, with an increasing focus on under-resourced languages and nonstandard annotation tasks. Traditional tools, while valuable, often fall short when addressing the requirements of linguistic researchers who deal with spoken data and its transcription and annotation processes. The labor-intensive nature of annotating large datasets, coupled with the high potential for error, poses a significant challenge, particularly in sociolinguistics/variationist linguistics where the accuracy of annotations are fundamental.

Introducing CorpusCompass

CorpusCompass is an open-source tool designed to facilitate the extraction and structuring of linguistic, pre-annotated datasets. It enables the customization of variables and annotation rules to suit specific research needs, ensuring data integrity through automated error checks and consistency verifications. By streamlining data analysis workflows, CorpusCompass supports linguistic researchers in uncovering language usage patterns efficiently.

Collaboration & Fundings

The development of CorpusCompass was undertaken within the project Modernity, Migration, and Minorities: Three Case Studies of Arabic in Contact at the University of Bayreuth (Arabistik) and funded by the Deutsche Forschungsgemeinschaft (DFG).

Google Sites

Report abuse