OLiA ontologies

This page enumerates the ontologies that are currently available. Officially, none of them has been released. They will be released under a Creative Commons Attribution Sharealike licence as soon as a reference publication has appeared. Until then, feel free to make use of them, but it would be nice to be notified if this happens. Besides the ontologies listed below, there are a number of experimental ontologies, e.g., concerning further annotation schemes, the linking with GOLD and the ISO TC37/SC4 Data Category Registry, and additional phenomena (discourse, coreference).

The OLiA architecture is a set of modular OWL/DL ontologies with ontological models of annotation schemes (Annotation Models) on the one hand, an ontology of reference terms (Reference Model) on the other hand, and ontologies (Linking Models) that implement subClassOf relationships between them.

Some remarks on viewing and browsing the ontologies: For browsing the OLiA ontologies, I recommend:

OwlSight is a light-weight online browser for ontologies (recommended only for taking a first look on the ontologies), or
Protégé is an JAVA-based ontology browser and editor (recommended for browsing, requires installation)

Both ontology browsers accept the URLs given below (insert by copy and paste).

Overview

Background

Integration of linguistic terminologies
An ontology-based approach
Ontology-based corpus querying

OLiA Reference Model
OLiA Annotation Models

Multilingual
English
German
other Germanic languages (Danish, Dutch, Norwegian, Swedish; Old Norse, Old High German)
Russian
other Slavic languages (Bulgarian, Czech, Macedonian, Polish, Resian, Slovak, Slovene, Ukrainian)
French
other Romance languages (Catalan, Italian, Portuguese, Romanian, Spanish)
Uralic and Altaic languages (Estonian, Finnish, Hungarian, Turkish)
other European languages (Basque, Georgian, Greek, Irish)
Indoiranian languages (Bangla, Farsi, Hindi, Konkani, Marathi, Sanskrit, Urdu)
Dravidian languages (Kannada, Malayalam, Tamil, Telugu)
Tibeto-Burman languages (Old Tibetan, Classical Tibetan, Balti, Ladakh; Dzongkha, Prinmi)
Eastern Asian languages (Chinese, Japanese)
Afroasiatic languages (Arabic, Guruntum, Hausa, Tangale)
Subsaharic African languages (Aja, Buli, Byali, Dagbani, Ditammari, Fon, Foodo, Guruntum, Hausa, Konni, Nateni, Tangale, Waamma, Yom)
Indigenous languages of the Americas, Australia and the Pacific (Teribe, Yucatec Maya, Mawng, Niue)
Annotation Models for discourse phenomena

External Reference Models

Background

Concentrating on the more elementary levels of linguistic analysis such as parts of speech and morphology, a generalization over different terminologies applied for the annotation of the corpora hosted by three collaborative research centers SFB 441 (Tübingen), SFB 538 (Hamburg) and SFB 632 (Potsdam/Berlin) was developed, and later extended for NLP tools and corpora beyond these resources. As a result, an ontology was developed which specifies reference terminology, and the tags of the original annotated data are linked with this reference terminology. Besides its function in annotation documentation, the ontology can be applied for the formulation of tag-set neutral corpus queries. For this purpose, I developed the OntoClient, a JAVA-based query pre-processor which translates formal ontology-based specifications into disjunctions of concrete tags. The OntoClient serves as a pre-processor for corpus querying languages such as ANNIS-QL and CQP, furthermore, it was applied in the specification of tag-set independent corpus processing scripts.

The OLiA ontologies were initially developed in the context of the project "Sustainability of Linguistic Resources", a collaborative project between three German Collaborative Research Centers (SFBs), The Collaborative Research Centres involved in the project are the SFB 538 'Multilingualism' at the University of Hamburg, the SFB 632 'Information Structure' at the University of Potsdam and the Humboldt University Berlin, and the SFB 441 'Linguistic Data Structures' at the Eberhard Karls University Tübingen.

The project aimed at preparing language resources to assure an accessible dissemination and sustainable storage of linguistic corpora. One of the main goals of the project was a practical one: resources acquired in long-term projects situated in the three Collaborative Research Centres have to be converted in either one or multiple formats to be sustainably usable by researchers and applications. Furthermore, the project developed unified methods of access for the heterogeneous data acquired in the projects.

The linguistic resources dealt by the project are highly heterogeneous:

the primary data itself is heterogeneous: size (e.g., single sentences vs. entire articles)
text types / data types (e.g. newspaper texts, diachronic texts, dialogues, treebanks, ...)
modality (monologue vs. dialogue)
categories of information covered by the annotation / annotation levels (e.g. layout, textual structure, morpho-syntax, syntax, ...)
underlying linguistic theories
language
the annotations require data structures of various types (attribute-value pairs, trees, pointers, etc.)
data is annotated by means of different, task-specific annotation tools

Integration of linguistic terminologies

One of the tasks addressed by the sustainability project was the integration of heterogeneous terminology, especially those applied for the annotation of existing corpora. Examples for such differences range from minor variation in the choice of tag names (which often go unrealized and thus, affect the reliability of broad-scale corpus studies) to fundamental conceptual differences.

Different abbreviations for the same annotations

E.g. pronominal adverbs in the German de-facto standard tag set STTS, annotated PROAV (Stuttgart variant of STTS), PAV (Tiger variant of STTS), or PROP (Tübingen variant of STTS) without any change in meaning.

Same abbreviation for different annotations

E.g. the indefinite article in STTS. In Tiger-STTS, PIAT is applied to the indefinite article in attributive use throughout, in Stuttgart-STTS, PIAT is restricted to "proper" indefinite articles, i.e. those which appear as articles of indefinite descriptions, while the indefinite article after a definite article is tagged as PIDAT.

Same annotation, but different interpretation

E.g. the concept "auxiliary verb". In STTS, the tag VAFIN, explained as "auxiliary verb", is used for German haben "to have; to own" and sein "to be; to be defined by; to exist" in all uses. In the SFB632 annotation standard, however, VAUX is restricted to German haben and sein in auxiliary use only, while the copula sein "to be equal to" and the lexical uses of haben "to own" and sein "to exist" are tagged separately.

Different granularity of tag sets

The SFB538/E2 tag set assigns all nouns (proper nouns and common nouns) the same tag, the SFB632 annotation standard designed for typological research, differentiates 2 types of nouns (common nouns and proper nouns), the Penn Treebank differentiates 4 types of nouns (common and proper nouns in singular and plural), the SUSANNE tag set for English differentiates approximately 63 types of nouns based on semantic and morphosyntactic properties, and in the Russian Uppsala corpus, we find 111 different tags for common and proper nouns according to morphological features.

Conceptual overlap

In languages with grammaticalized determiners, attributive possessive pronouns can be regarded as determiners, as they, like an article, fulfil the function to mark a nominal as a noun phrase (resp. determiner phrase). However, in the literal sense (and traditional grammar), attributive possessive pronouns are "pro-nouns", i.e. replacements of names, i.e. they are characterized by their referentiality, and hence, pronouns. There is free variation among tag sets whether attributive possessive pronouns are regarded as determiners (ccording to their syntactic function), or pronouns (according to their semantic characterization).

All these problems are taken from the seemingly most elementary domain, the domain of part of speech tags, however, more problems arise as soon as morphology, syntax, or discourse phenomena are addressed.

In order to overcome such problems, terminological integration is necessary, i.e.

documentation of terminological differences
harmonization between different terminologies

To provide an integrated access to terminologically heterogeneous resources, it is also necessary to provide an abstract model of linguistic reference terminology to which individual annotations refer, a so-called "terminological backbone".

Classical solutions are the standardization approach and the interlingua approach:

Standardization (cf. the EAGLES recommendations on morphosyntactic annotation)

Definition of a reference inventory of terms which must or may be considered by a standard-conformant annotation scheme. Concrete annotations are directly mapped onto reference terms or a disjunction of reference terms. (Wilson and Leech 1996)

Interlingua (cf. the AMALGAM project)

From different annotation schemes, or tag sets, an abstracted representation is derived which subsumes all possible differences between the participating tag sets. Whenever no direct mapping of annotations (e.g. X and Y) from different annotation schemes (e.g. A and B) is possible, all possible combinations must be represented in the interlingua, i.e. (A:X,B:X), (A:X,B:Y), (A:Y,B:X), (A:Y,B:Y).

Both solutions are limited in flexibility and scalability, and hence, both approaches are applicable only within a limited domain. The standardization approach relies on the existence of common grammatical categories and features found in the languages for which standard-conformant tag sets are to be developed. Otherwise, it results in projection of complexity (e.g. the standard entails predictions for grammatical categories for a standard-conformant tagset which are absent in a language). However, even the sheer existence of universal morphosyntactic categories has been questioned in typologic research, and hence, the EAGLES-based standardization approach is unlikely to extend beyond "Standard Average European" languages.

The interlingua approach, however, involves the process to construct an interlingua between existing schemes, and is less statically than the standardization process. However, the complexity of the interlingua grows monotonically with every new language/tag set considered, and, hence, the general applicability of the interlingua approach is restricted by its limited scalability.

Therefore, the project is currently developing an ontology of linguistic annotations as a more flexible representation of a "terminological backbone".

An ontology-based approach

So far, we have developed an ontology of linguistic annotations with special consideration of part of speech and morphological annotations existing the participating Collaborative Research Centers (Schmidt et al. 2006, Chiarcos 2006c, Chiarcos 2006d, Chiarcos 2007).

The approach relies on the ontological reconstruction of annotation schemes based on guidelines and additional documentation in so-called "annotation models" (or "domain models").

Every annotation model represents one tag set or annotation scheme, with nonterminal nodes (concepts) representing conceptual categories as mentioned in the documentation or indicated in the document structure of the annotation guidelines, and terminal nodes (instances) representing concrete annotation values, or tags.

As an illustration, prototypes for the following annotation models are available in an HTML serialization:

STTS (POS tags, German) [owl] (Stuttgart, Tübingen and Tiger-Variant)
Tiger-Morphology (Morphology, POS tags inherited from STTS, German) [owl]
SUSANNE (POS tags with partial information about morphosyntax and lexical semantics, English) [owl]
Uppsala (POS tags and morphology, Russian) [owl]

With respect to morphosyntactic annotations, the OLiA annotation models currently comprise 16 annotation schemes applied to 42 languages (5 annotation models for English, 5 annotation models for German, 2 annotation models for Russian, one annotation model for Tibetan, one for Old High German, the Connexor annotation model for 10 European languages, one annotation model for a typologically-oriented annotation scheme applied to 29 languages). Annotation models for syntax and information structure/anaphora are currently under construction.

The concepts of these annotation models are linked to a common "reference model" which is based on the EAGLES recommendations for morphosyntax, and extended according to the needs of the participating annotation models, hence it is also referred to as "E(xtended)-EAGLES" ontology.

E-EAGLES ontology [owl]

The annotation models are then mapped onto the categories specified in the reference model by means of conceptual subsumption (rdfs:subClassOf, rdfs:subPropertyOf). This mapping is specified in separate "linking files", thus making both the reference model and the annotation models independent and self-contained ontologies.

STTS E-EAGLES linking [owl]
SUSANNE E-EAGLES linking [owl]
Uppsala E-EAGLES linking [owl]

The "reference model", however, does not specify authoritative definitions for existing terminology, but only a fairly traditional view on it. Hence, its primary function is not to provide prescriptive definitions of terms, but only to provide a reference point for the participating annotation models. Whenever a more reliable ontology of linguistic terminology will be developed (e.g. revised versions of the General Ontology of Linguistic Description (GOLD) or the grammis ontology), the reference model can be linked with it in the same way as the annotation models are linked with the reference model, and thus mediate between such an external reference model and the annotation models. In this sense, the reference model serves as an interface to the annotation model, and it could be better termed "interface model".

an exemplary implementation of the linking of E-EAGLES with an an extended version of GOLD, v.0.3 as an external reference model [owl]

Ontology-based corpus querying

Besides the purely documentation function of the ontologies, the specifications in the ontology can be used for tag-set neutral corpus querying. In essence, this means that expressions from the ontology can be directly used for corpus queries. As an example, a user may enter the query

PossessivePronoun and hasNumber(Singular) and hasGender(Neuter) and hasCase(Genitive)
instead of the SUSANNE tag
APPGh1

Of course, APPGh1 is shorter, but it is a cryptic and idiosyncratic abbreviation, and knowing about the function of APPGh1 in SUSANNE helps nothing when searching for the corresponding items in, say, the Uppsala corpus, where the same query expands to

pronomen_pos_1p_gen_sg_neut_opl | pronomen_pos_2p_gen_sg_neut_opl | ...

Especially, this kind of ontology-based corpus querying can thus allow researchers unfamiliar with a certain resource to take a first glance at a corpus with an unknown tag set without having to spend to much efforts in locating and consuming the annotation documentation. Hence, the bias for re-usability of existing resources is substantially lowered.

For ontology-based corpus querying, the OntoClient is developed, a JAVA-package that works as a pre-processor for corpus queries. Given a certain string, the OntoClient replaces ontology-sensitive sub-strings with the disjunction of tags retrieved as instances which satisfy the criteria specified in the ontology-sensitive sub-string.

The output of the OntoClient is highly configurable, and thus, it can be easily applied to practically any kind of existing corpus query interface.

Currently, we have implemented a prototype for an ontology-sensitive CQP interface.
At the GLDV Frühjahrstagung 2007, Christian Chiarcos and Michael Götze presented the integration of the OntoClient with the ANNIS.
At the RaNLP 2007, Georg Rehm, Richard Eckart and Christian Chiarcos will present the application of the OntoClient as a pre-processor for XQuery templates.

OLiA Reference Model and system ontologies

Module	Phenomenon	OWL/DL models
OLiA Reference Model for morphosyntax, morphology and syntax	morphosyntax, morphology and syntax	http://purl.org/olia/olia.owl
OLiA Reference Model for discourse structure	discourse structure, discourse relations	t.b.a
OLiA Reference Model for information structure	information structure, information status, coreference	t.b.a
OLiA System Ontology	basic annotation data structures	http://purl.org/olia/system.owl
OLiA Top-Level Ontology	top-level concepts of the OLiA Reference Model for morphosyntax, morphology and syntax	http://purl.org/olia/olia-top.owl

Annotation Models

Multilingual Annotation Models for morphological, morphosyntactic and syntactic annotation

Tagset / NLP tool	Phenomenon	Languages	OWL/DL models
SFB632 annotation standard (Dipper et al. 2008)	parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)	> 30 typologically different languages, including many African languages	Annotation Model, Linking Model
EAGLES recommendations (Leech and Wilson 1996)	morphosyntax	11 EU languages, incl. Romance, Germanic, Greek and Irish	Annotation Model, Linking Model
Connexor dependency parser	morphosyntax, morphology, dependency syntax	10 European languages, incl. Romance, Germanic and Uralic languages	Annotation Model, Linking Model
MULTEXT-East	morphosyntax, morphology	15 mostly Eastern European languages, incl. Slavic, Romance, Uralic languages and Persian	Annotation Model (common specifications), Linking Model; Annotation Model (all languages), see project page and below for individual languages
IL-POSTS tagset Baskaran et al. (2008)	morphosyntax	languages of the Indian subcontinent	Annotation Model, Linking Model
AnnCorra Bharati et al. (2006)	morphosyntax, chunks	languages of the Indian subcontinent	Annotation Model, Linking Model
IIIT tagset IIT (2007)	morphosyntax	languages of the Indian subcontinent	Annotation Model, Linking Model

Annotation Models for the morphological, morphosyntactic and syntactic annotation of English

Tagset / NLP tool	Phenomenon	OWL/DL models
Brown corpus tagset	morphosyntax	Annotation Model, Linking Model
Connexor dependency parser	morphosyntax, morphology, dependency syntax	Annotation Model, Linking Model
EAGLES recommendations (English) (Leech and Wilson 1996)	morphosyntax	Annotation Model, Linking Model
GENIA corpus	morphosyntax	Annotation Model, Linking Model
MULTEXT-East (English)	morphosyntax	Annotation Model, Linking Model
Penn Treebank	morphosyntax	Annotation Model, Linking Model
Penn Treebank	syntax	Annotation Model, Linking Model
QTag	morphosyntax	Annotation Model, Linking Model
Stanford dependency parser	dependency syntac	Annotation Model, Linking Model
Susanne corpus	morphosyntax	Annotation Model, Linking Model

Annotation Models for the morphological, morphosyntactic and syntactic annotation of German

Tagset / NLP tool	Phenomenon	OWL/DL models
Connexor dependency parser	morphosyntax, morphology, dependency syntax	Annotation Model, Linking Model
EAGLES recommendations (German) (Leech and Wilson 1996)	morphosyntax	Annotation Model, Linking Model
Morphisto	morphology	Annotation Model, Linking Model
STTS	morphosyntax	Annotation Model, Linking Model
TIGER/NEGRA	morphology	Annotation Model, Linking Model
TIGER/NEGRA	constituent syntax	Annotation Model, Linking Model
TreeTagger Chunker	chunk labels	Linking Model
RFTagger	morphosyntax, morphology	t.b.a

Annotation Models for the morphological, morphosyntactic and syntactic annotation of other Germanic languages

Tagset / NLP tool	Phenomenon	Languages	OWL/DL models
EAGLES recommendations (Leech and Wilson 1996)	morphosyntax; inflectional morphology	Danish, Dutch, Swedish (and several non-Germanic languages)	Annotation Model, Linking Model
Connexor	morphosyntax, morphology, dependency syntax	Dutch, Swedish, Danish, Norwegian	Annotation Model, Linking Model
SFB632 annotation standard (Dipper et al. 2008)	parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)	Dutch (among other languages)	Annotation Model, Linking Model
MENOTA (incomplete)	morphosyntax	Old Norse	Annotation Model, Linking Model
T-CODEX	morphosyntax, syntax, information structure	Old High German	Annotation Model, Linking Model

Annotation Models for the morphological, morphosyntactic and syntactic annotation of Russian

Tagset / NLP tool	Phenomenon	OWL/DL models
Uppsala corpus tagset	morphosyntax, morphology	Annotation Model, Linking Model
Russian TreeTagger (Serge Sharoff)	morphosyntax	Annotation Model, Linking Model
MULTEXT-East for Russian	morphosyntax, morphology	Annotation Model, Linking Model

Annotation Models for the morphosyntactic annotation of other Slavic languages

Tagset / NLP tool	Languages	OWL/DL models
MULTEXT-East	Bulgarian	Annotation Model, Linking Model
MULTEXT-East	Czech	Annotation Model, Linking Model
MULTEXT-East	Macedonian	Annotation Model, Linking Model
MULTEXT-East	Polish	Annotation Model, Linking Model
MULTEXT-East	Slovak	Annotation Model, Linking Model
MULTEXT-East	Slovene	Annotation Model, Linking Model
MULTEXT-East	Resian (Slovene spoken in Italy)	Annotation Model, Linking Model
MULTEXT-East	Serbian	Annotation Model, Linking Model
MULTEXT-East	Ukrainian	Annotation Model, Linking Model

Annotation Models for the morphological, morphosyntactic and syntactic annotation of French

Tagset / NLP tool	Phenomenon	OWL/DL models
EAGLES recommendations (Leech and Wilson 1996)	morphosyntax	Annotation Model, Linking Model
French TreeTagger (Achim Stein)	morphosyntax	Annotation Model
Le Monde corpus (Abeillé et al. 2000)	morphosyntax	Annotation Model
Connexor	morphosyntax, morphology, dependency syntax	Annotation Model, Linking Model
SFB632 annotation standard (Dipper et al. 2008)	parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) for Canadian French (among other languages)	Annotation Model, Linking Model

Annotation Models for the morphological, morphosyntactic and syntactic annotation of other Romance languages

Tagset / NLP tool	Phenomenon	Languages	OWL/DL models
EAGLES recommendations (Leech and Wilson 1996)	morphosyntax	Catalan, Portuguese, Spanish	Annotation Model, Linking Model
Connexor	morphosyntax, morphology, dependency syntax	Spanish, Italian	Annotation Model, Linking Model
PAROLE Spanish/Catalan (http://nlp.lsi.upc.edu/freeling)	morphosyntax, inflectional morphology	Spanish, Italian	Annotation Model
MULTEXT-East	morphosyntax, morphology	Romanian	Annotation Model, Linking Model

Annotation Models for the morphological, morphosyntactic and syntactic annotation of Uralic and Altaic languages

Tagset / NLP tool	Phenomenon	Languages	OWL/DL models
Connexor	morphosyntax, morphology, dependency syntax	Finnish	Annotation Model, Linking Model
MULTEXT-East	morphosyntax, morphology	Estonian	Annotation Model, Linking Model
MULTEXT-East	morphosyntax, morphology	Hungarian	Annotation Model, Linking Model
SFB632 annotation standard (Dipper et al. 2008)	parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)	Hungarian (among other languages)	Annotation Model, Linking Model
Turkish POS tagset (Oflazer et al. 2003)	morphosyntax	Turkish	Annotation Model

Annotation Models for the morphosyntactic annotation of other European languages

Tagset / NLP tool	Phenomenon	Languages	OWL/DL models
EAGLES recommendations (Leech and Wilson 1996)	morphosyntax	Greek, Irish (among other EU languages)	Annotation Model, Linking Model
SFB632 annotation standard (Dipper et al. 2008)	parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)	Georgian, Greek (among other languages)	Annotation Model, Linking Model
EUSTagger (Ezeiza et al. 1998)	morphosyntax	Basque	Annotation Model

Annotation Models for the morphosyntactic annotation of Indoiranian languages

Tagset / NLP tool	Phenomenon	Languages	OWL/DL models
Urdu EMILLE tagset Hardie (2003, 2004)	morphosyntax, inflectional morphology	Urdu	Annotation Model, Linking Model
Urdu tagset Sajjad (2007)	morphosyntax	Urdu	Annotation Model, Linking Model
IL-POSTS tagset Baskaran et al. (2008)	morphosyntax, inflectional morphology	Bangla, Hindi, Marathi, Sanskrit	Annotation Model, Linking Model
AnnCorra Bharati et al. (2006)	morphosyntax, chunks	Bangla, Hindi	Annotation Model, Linking Model
IIIT tagset IIIT (2007)	morphosyntax	Hindi, Marathi	Annotation Model, Linking Model
SFB632 annotation standard (Dipper et al. 2008)	parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)	Konkani (among other, unrelated languages)	Annotation Model, Linking Model
MULTEXT-East	morphosyntax	Farsi (Persian)	Annotation Model, Linking Model

Annotation Models for the morphosyntactic annotation of Dravidian languages

Tagset / NLP tool	Phenomenon	Languages	OWL/DL models
IL-POSTS tagset Baskaran et al. (2008)	morphosyntax	Kannada, Malayalam, Tamil, Telugu	Annotation Model, Linking Model
AnnCorra Bharati et al. (2006)	morphosyntax, chunks	Telugu, Tamil	Annotation Model, Linking Model
IIIT tagset IIIT (2007)	morphosyntax	Telugu	Annotation Model, Linking Model

Annotation Models for the morphological, morphosyntactic and syntactic annotation of Tibeto-Burman languages

Tagset / NLP tool	Phenomenon	Languages	OWL/DL models
Dzongkha tagset (Chungku et al. 2010)	morphosyntax	Dzongkha	Annotation Model, Linking Model
SFB632 annotation standard (Dipper et al. 2008)	parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)	Prinmi (among other, unrelated languages)	Annotation Model, Linking Model
Tübingen Tibetan Corpora (Wagner & Zeisler 2004)	morphosyntax, morphology, syntax	Tibetan (Old Tibetan, Classical Tibetan, Balti, Ladakh)	Annotation Model

Annotation Models for East Asian languages

Annotation scheme / Corpus	Phenomenon	Languages	OWL/DL models
Penn Chinese Treebank (Xia 2000)	morphosyntax	Chinese	Annotation Model
SFB632 annotation standard (Dipper et al. 2008)	parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)	Japanese (among other, unrelated languages)	Annotation Model, Linking Model

Annotation Models for Afroasiatic languages

Annotation scheme / Corpus	Phenomenon	Languages	OWL/DL models
Arabic tagset (Khoja 2001)	morphosyntax	Arabic	Annotation Model
SFB632 annotation standard (Dipper et al. 2008)	parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)	Chadic languages (including Guruntum, Tangale, Hausa)	Annotation Model, Linking Model
Hausa Internet Corpus (Chiarcos et al. 2011)	morphosyntax	Hausa	t.b.a

Annotation Models for the languages of Subsaharic Africa

Annotation scheme / Corpus	Phenomenon	Languages	OWL/DL models
SFB632 annotation standard (Dipper et al. 2008)	parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)	Gur and Kwa languages (including Aja, Dagbani, Buli, Byali, Ditammari, Fon, Foodo, Konni, Nateni, Waamma, Yom)	Annotation Model, Linking Model
SFB632 annotation standard (Dipper et al. 2008)		Chadic languages (including Guruntum, Tangale, Hausa)
Hausa Internet Corpus (Chiarcos et al. 2011)	morphosyntax	Hausa	t.b.a

Annotation Models for indigenous languages of the Americas, Australia and the Pacific

Annotation scheme / Corpus	Phenomenon	Languages	OWL/DL models
SFB632 annotation standard (Dipper et al. 2008)	parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)	Teribe, Yucatec Maya, Mawng, Niue	Annotation Model, Linking Model

Annotation Models for discourse annotations

Annotation scheme / Corpus	Phenomenon	Languages	OWL/DL models
ARRAU corpus	coreference	English	t.b.a
CRC 732, A3 annotations of the Stuttgarter Radio News Corpus	information status, pronominal coreference	German	t.b.a
OntoNotes	coreference	English	t.b.a
Penn Discourse Graphbank	discourse relations	English	t.b.a
Penn Discourse Treebank	connectives, discourse relations	English	t.b.a
Potsdam Coreference Scheme	coreference	English, German	t.b.a
RST Discourse Treebank	RST discourse relations and discourse segments	English	t.b.a

External Reference Models

Terminological repository	Original url	Local url	Linking Model
ISO TC37/SC4 Data Category Registry	http://www.isocat.org	t.b.a	t.b.a
GOLD	http://linguistics-ontology.org	t.b.a	t.b.a