ScaDS Logo

CENTER FOR
SCALABLE DATA ANALYTICS
AND ARTIFICIAL INTELLIGENCE

 
Canonical Text Service 

  
Background
 
The Canonical Text Services protocol defines interaction between a client and server providing identification of texts and retrieval of canonically cited passages of texts. The official specifications by David Neel Smith and Christopher Blackwell can be found here.
To put it relatively simple: CTS serves text passages that are specified by URN like references. It is specified in a way that allows to create CTS URNs for any possible text passage in a document.
The data can be requested using GET requests that are provided in an URL. Each request must contain one parameter request which specifies the CTS function to use. Function specific parameters - like the URN - are added as additional GET parameters.
For example, the following CTS request returns the text content of chapter 3 of the book Genesis of the English King James Bible.
http://cts.informatik.uni-leipzig.de/pbc/cts/?request=GetPassage&urn=urn:cts:pbc:bible.parallel.eng.kingjames:1.3
CTS requests include a URL parameter named request. The value of this parameter must be the name of one the seven requests: GetCapabilities, GetValidReff, GetFirstUrn, GetPrevNextUrn, GetLabel, GetPassage or GetPassagePlus. All requests other than GetCapabilities further require a parameter named urn. The value of this parameter must be a valid CTS URN value, as defined in the CTS URN specification.
 
Static URNs
Document
urn:cts:pbc:bible.parallel.eng: Click
urn:cts:pbc:bible.parallel.eng.kingjames: Click
Text part
urn:cts:pbc:bible.parallel.eng:1 Click
urn:cts:pbc:bible.parallel.eng.kingjames:1.3.2 Click
 
Dynamic URNs
Text span (From one text part to another)
urn:cts:pbc:bible.parallel.eng:1.2-1.5.6 Click
Sub passage notation
urn:cts:pbc:bible.parallel.eng:1.2@the[2]-1.5.6@five Click
The project website (www.cts.informatik.uni-leipzig.de ) shows a number of webservices and selected data instances that illustrate the current state of Leipzig's Canonical Text Infrastructure.
 
Preliminary Results
 
The growing list of CTS instances provide open public access to a wide variety of texts including ancient literature, bible translations in >800 languages, more modern literature like Shakespeare or the Deutsche Text Archiv, an Arabic newspaper, multi lingual video transcripts and many others.
The protocol is further extended to provide support for more use cases. These extensions are done sensibly while making sure that they do not contradict the protocol. Two major extensions are the support for text licenses and passage post processing that allows different views on any requested text passage like for instance a div-type based generic structure notation.

 

 

 
Additionally to providing the data sets and expanding on the protocol, tools are developed that use this novel approach to machine interpretable text communication for text analysis techniques like structure based realtime text alignment.

 

 

 

Project Members
Prof. Dr. Gerhard Heyer
Jochen Tiepmar
Sascha Ludwig ( Former Student Member )