Annotating Meaning at Scale: Developing an Integrated Research Environment for Textual Data
TEKER – Annotating Meaning at Scale: Developing an Integrated Research Environment for Textual Data
The project TEKER addresses a central methodological gap in the humanities and social sciences: the lack of integrated, user-friendly tools that enable large-scale, semantically informed text annotation while retaining interpretive depth. Existing platforms either provide fine-grained qualitative annotation or scalable machine-learning functionality, but not both. TEKER will bridge this divide by creating an open-source, fully dockerized Python application that integrates transformer and Retrieval-Augmented Generation (RAG) pipelines with human-in-the-loop workflows. Researchers will be able to upload corpora, define custom tags, perform batch annotation with automatic Wikidata Q-ID suggestions, and refine results through an intuitive interface.
The project combines scalability with contextual awareness, aligning with FAIR data principles and supporting multi-layered analysis across temporal, spatial, and textual dimensions. Its direct users—historians, philologists, sociologists, anthropologists, and related scholars—will gain a powerful environment for transforming textual sources into structured, interoperable data.
TEKER will be developed over twelve months by a specialized developer and a student assistant under the PI’s supervision. After public release via GitHub, the tool will be hosted within the University of Vienna’s digital research infrastructure, contributing to its emerging core science facilities and national networks such as CLARIAH-AT and DHInfra.at. Dissemination will leverage the PI’s international collaborations in digital humanities and interpretive social science, ensuring rapid uptake and continued co-development. Released under an open-source license and adhering to FAIR and Open Science standards, TEKER will offer a sustainable, interdisciplinary platform for annotating meaning at scale.