Linguistics and Literature Studies Vol. 5(2), pp. 99 - 104
DOI: 10.13189/lls.2017.050205
Reprint (PDF) (1015Kb)


Integrating Canonical Text Services into CLARIN's Search Infrastructure


Jochen Tiepmar *, Thomas Eckart , Dirk Goldhahn , Christoph Kuras
Department of Natural Language Processing, Faculty of Mathematics and Computer Science, University of Leipzig, Saxony, Germany

ABSTRACT

Today's digital research infrastructures target a variety of user groups. A key task to achieve acceptance and active participation among them are both user-friendly and machine-readable interfaces to digital resources. This is especially the case for highly integrated infrastructures like the CLARIN project. The Canonical Text Service Protocol CTS is an established system in document based Digital Humanities that covers many of associated problems, like dealing with varying levels of text granularity, persistent identification, address resolution and simple interfaces for an integration in various automatic work flows. The paper shows the advantages of integrating a CTS instance into CLARIN and also demonstrates additional benefits of this CTS implementation in form of built-in text mining techniques.

KEYWORDS
Canonical Text Service, CLARIN, Linguistic Infrastructures, Webservice, Text Data, Text Mining

Cite this paper
Jochen Tiepmar , Thomas Eckart , Dirk Goldhahn , Christoph Kuras (2017). Integrating Canonical Text Services into CLARIN's Search Infrastructure. Linguistics and Literature Studies, 5 , 99 - 104. doi: 10.13189/lls.2017.050205.