An Overview of Canonical Text Services

Jochen Tiepmar *, Gerhard Heyer
Natural Language Processing Group, University of Leipzig, Leipzig, 04009, Saxony, Germany


This paper provides a comprehensive overview of Canonical Text Services (CTS) and the surrounding tools that were developed on the basis of a MySQL based implementation. As such it covers a broad set of topics including a general explanation of CTS, various software tools and a wide array of text mining techniques. The goal is to compile the relatively widespread and potentially confusing amount of information into one document that focuses on the practical aspects and implications for researchers that work with text data. More technically focused aspects are discussed in the two papers that accompany this implementation ([20] and [21]) and the official CTS specifications. Additionally this paper introduces a licensing mechanism, a CTS based citation analysis workflow, a real time text alignment method and set of management tools including a central namespace resolver for CTS URNs.

