Find Orientation in the Ocean of Text

CorpusCompass

A Tool for Data Extraction and Dataset Generation

Download on GitHub

CorpusCompass is a tool aimed for linguistic data processing. Whether you're new to programming or simply looking for an efficient way to process linguistic data, our tool is designed to assist. It's common in linguistic research to be overwhelmed by vast amounts of data, whether it's data you've collected yourself or pre-existing text. CorpusCompass is your compass within the ocean of text and helps you to maintain clarity and find your way.

CorpusCompass is designed with your linguistic research needs in mind:

Tailored for specific Needs: Define your linguistic features and annotation schemes to make the tool fit with your unique research requirements.
Accuracy & Consistency: Beyond creating structured datasets from your annotated text, CorpusCompass prioritizes error-checking to maintain data integrity.
Make More of Your Data: CorpusCompass supports you to explore correlations, identify dependencies, and unveil the effects of social dynamics on language use.

CorpusCompass Needs Your Help!

As both Muhadj and Nicolo' have completed their PhDs, we no longer have enough time to continue developing CorpusCompass at the pace it deserves. To ensure its future, we are looking for researchers, institutions, or academic groups interested in taking over or contributing to its development.

If you believe in the importance of open-source tools for Corpus Linguistics and want to help maintain and expand CorpusCompass, please get in touch, we’d love to discuss how you can be involved!

Get in touch