OLiA ontologies

 

This page enumerates the ontologies that are currently available. Officially, none of them has been released. They will be released under a Creative Commons Attribution Sharealike licence as soon as a reference publication has appeared. Until then, feel free to make use of them, but it would be nice to be notified if this happens. As a reference, see the ontology-relevant publications under http://www.sfb632.uni-potsdam.de/~chiarcos, and some remarks on the background of the OLiA ontologies. Besides the ontologies listed below, there are a number of experimental ontologies, e.g., concerning further annotation schemes, the linking with GOLD and the ISO TC37/SC4 Data Category Registry, and additional phenomena (discourse, coreference). Please contact Christian Chiarcos (FIRSTNAME_IN_LOWERCASE [DOT] LASTNAME_IN_LOWERCASE [AT] web [DOT] de) if you're interested in these.

The OLiA architecture is a set of modular OWL/DL ontologies with ontological models of annotation schemes (Annotation Models) on the one hand, an ontology of reference terms (Reference Model) on the other hand, and ontologies (Linking Models) that implement subClassOf relationships between them.

 

Some remarks on viewing and browsing the ontologies: For browsing the OLiA ontologies, I recommend

Both ontology browsers accept the URLs given below (insert by copy and paste).

 

Overview

 

OLiA Reference Model and system ontologies

Module

phenomenon

OWL/DL models

OLiA Reference Model for morphosyntax, morphology and syntax

morphosyntax, morphology and syntax

http://purl.org/olia/olia.owl

OLiA Reference Model for discourse structure

discourse structure, discourse relations

t.b.a

OLiA Reference Model for information structure

information structure, information status, coreference

t.b.a

OLiA System Ontology

basic annotation data structures

http://purl.org/olia/system.owl

OLiA Top-Level Ontology

top-level concepts of the OLiA Reference Model for morphosyntax, morphology and syntax

http://purl.org/olia/olia-top.owl

 

Multilingual Annotation Models for morphological, morphosyntactic and syntactic annotation

tagset / NLP tool

phenomenon

languages

OWL/DL models

SFB632 annotation standard (Dipper et al. 2008)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

> 30 typologically different languages, including many African languages

Annotation Model, Linking Model

EAGLES recommendations
(Leech and Wilson 1996)

morphosyntax

11 EU languages, incl. Romance, Germanic, Greek and Irish

Annotation Model, Linking Model

Connexor dependency parser

morphosyntax, morphology, dependency syntax

10 European languages, incl. Romance, Germanic and Uralic languages

Annotation Model, Linking Model

MULTEXT-East

morphosyntax, morphology

15 mostly Eastern European languages, incl. Slavic, Romance, Uralic languages and Persian

Annotation Model (common specifications), Linking Model; Annotation Model (all languages), see project page and below for individual languages

IL-POSTS tagset
Baskaran et al. (2008)

morphosyntax

languages of the Indian subcontinent

Annotation Model, Linking Model

AnnCorra
Bharati et al. (2006)

morphosyntax, chunks

languages of the Indian subcontinent

Annotation Model, Linking Model

IIIT tagset
IIT (2007)

morphosyntax

languages of the Indian subcontinent

Annotation Model, Linking Model

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of English

tagset / NLP tool

phenomenon

OWL/DL models

Brown corpus tagset

morphosyntax

Annotation Model, Linking Model

Connexor dependency parser

morphosyntax, morphology, dependency syntax

Annotation Model, Linking Model

EAGLES recommendations (English)
(Leech and Wilson 1996)

morphosyntax

Annotation Model, Linking Model

GENIA corpus

morphosyntax

Annotation Model, Linking Model

MULTEXT-East (English)

morphosyntax

Annotation Model, Linking Model

Penn Treebank

morphosyntax

Annotation Model, Linking Model

 

syntax

Annotation Model, Linking Model

QTag

morphosyntax

Annotation Model, Linking Model

Stanford dependency parser

dependency syntac

Annotation Model, Linking Model

Susanne corpus

morphosyntax

Annotation Model, Linking Model

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of German

tagset / NLP tool

phenomenon

OWL/DL models

Connexor dependency parser

morphosyntax, morphology, dependency syntax

Annotation Model, Linking Model

EAGLES recommendations (German)
(Leech and Wilson 1996)

morphosyntax

Annotation Model, Linking Model

Morphisto

morphology

Annotation Model, Linking Model

STTS

morphosyntax

Annotation Model, Linking Model

TIGER/NEGRA

morphology

Annotation Model, Linking Model

 

constituent syntax

Annotation Model, Linking Model

TreeTagger Chunker

chunk labels

Linking Model

RFTagger

morphosyntax, morphology

t.b.a

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of other Germanic languages

tagset/NLP tool

language

phenomenon

OWL/DL models

EAGLES recommendations
(Leech and Wilson 1996)

Danish, Dutch, Swedish (and several non-Germanic languages)

morphosyntax; inflectional morphology

Annotation Model, Linking Model

Connexor

Dutch, Swedish, Danish, Norwegian

morphosyntax, morphology, dependency syntax

Annotation Model, Linking Model

SFB632 annotation standard
(Dipper et al. 2008)

Dutch (among other languages)
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

MENOTA (incomplete)

Old Norse

morphosyntax

Annotation Model, Linking Model

T-CODEX

Old High German

morphosyntax, syntax, information structure

Annotation Model, Linking Model

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of Russian

tagset / NLP tool

phenomenon

OWL/DL models

Uppsala corpus tagset

morphosyntax, morphology

Annotation Model, Linking Model

Russian TreeTagger
(Serge Sharoff)

morphosyntax

Annotation Model, Linking Model

MULTEXT-East for Russian

morphosyntax, morphology

Annotation Model, Linking Model

 

Annotation Models for the morphosyntactic annotation of other Slavic languages

tagset / NLP tool

language

OWL/DL models

MULTEXT-East

Bulgarian

Annotation Model, Linking Model

 

Czech

Annotation Model, Linking Model

 

Macedonian

Annotation Model, Linking Model

 

Polish

Annotation Model, Linking Model

 

Slovak

Annotation Model, Linking Model

 

Slovene

Annotation Model, Linking Model

 

Resian (Slovene spoken in Italy)

Annotation Model, Linking Model

 

Serbian

Annotation Model, Linking Model

 

Ukrainian

Annotation Model, Linking Model

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of French

tagset / NLP tool

phenomenon

OWL/DL models

EAGLES recommendations
(Leech and Wilson 1996)

morphosyntax

Annotation Model, Linking Model

French TreeTagger
(Achim Stein)

morphosyntax

Annotation Model

Le Monde corpus
(Abeillé et al. 2000)

morphosyntax

Annotation Model

Connexor

morphosyntax, morphology, dependency syntax

Annotation Model, Linking Model

SFB632 annotation standard
(Dipper et al. 2008)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) for Canadian French (among other languages, SFB 632, project D2)

Annotation Model, Linking Model

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of other Romance languages

tagset

language

phenomenon

OWL/DL models

EAGLES recommendations
(Leech and Wilson 1996)

Catalan, Portuguese, Spanish

morphosyntax

Annotation Model, Linking Model

Connexor

Spanish, Italian

morphosyntax, morphology, dependency syntax

Annotation Model, Linking Model

PAROLE Spanish/Catalan
(http://nlp.lsi.upc.edu/freeling)

Spanish, Catalan

morphosyntax, inflectional morphology

Annotation Model

MULTEXT-East

Romanian

morphosyntax, morphology

Annotation Model, Linking Model

 

Annotation Models for the morphological, morphosyntactic and syntactic annotation of Uralic and Altaic languages

tagset

language

phenomenon

OWL/DL models

Connexor

Finnish

morphosyntax, morphology, dependency syntax

Annotation Model, Linking Model

MULTEXT-East

Estonian

morphosyntax, morphology

Annotation Model, Linking Model

 

Hungarian

morphosyntax, morphology

Annotation Model, Linking Model

SFB632 annotation standard
(Dipper et al. 2008)

Hungarian (among other languages)
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

Turkish POS tagset
(Oflazer et al. 2003)

Turkish

morphosyntax

Annotation Model

 

Annotation Models for the morphosyntactic annotation of other European languages

tagset

language

phenomenon

OWL/DL models

EAGLES recommendations
(Leech and Wilson 1996)

Greek, Irish (among other EU languages)

morphosyntax

Annotation Model, Linking Model

SFB632 annotation standard
(Dipper et al. 2008)

Georgian, Greek (among other languages)
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

EUSTagger
Ezeiza et al. (1998)

Basque

morphosyntax

Annotation Model

 

Annotation Models for the morphosyntactic annotation of Indoiranian languages

tagset

language

phenomenon

OWL/DL models

Urdu EMILLE tagset
Hardie (2003, 2004)

Urdu

morphosyntax, inflectional morphology

Annotation Model, Linking Model

Urdu tagset
Sajjad (2007)

Urdu

morphosyntax

Annotation Model, Linking Model

IL-POSTS tagset
Baskaran et al. (2008)

Bangla, Hindi, Marathi, Sanskrit

morphosyntax, inflectional morphology

Annotation Model, Linking Model

AnnCorra
Bharati et al. (2006)

Bangla, Hindi

morphosyntax, chunks

Annotation Model, Linking Model

IIIT tagset
IIIT (2007)

Hindi, Marathi

morphosyntax

Annotation Model, Linking Model

SFB632 annotation standard
(Dipper et al. 2008)

Konkani (among other, unrelated languages)
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

MULTEXT-East

Farsi (Persian)

morphosyntax

Annotation Model, Linking Model

 

Annotation Models for the morphosyntactic annotation of Dravidian languages

tagset

language

phenomenon

OWL/DL models

IL-POSTS tagset
Baskaran et al. (2008)

Kannada, Malayalam, Tamil, Telugu

morphosyntax

Annotation Model, Linking Model

AnnCorra
Bharati et al. (2006)

Telugu, Tamil

morphosyntax, chunks

Annotation Model, Linking Model

IIIT tagset
IIIT (2007)

Telugu

morphosyntax

Annotation Model, Linking Model

Annotation Models for the morphological, morphosyntactic and syntactic annotation of Tibeto-Burman languages

tagset

language

phenomenon

OWL/DL models

Dzongkha tagset
(Chungku et al. 2010)

Dzongkha

morphosyntax

Annotation Model, Linking Model

SFB632 annotation standard
(Dipper et al. 2008)

Prinmi (among other, unrelated languages)
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

Tübingen Tibetan Corpora
(Wagner & Zeisler 2004)

Tibetan (Old Tibetan, Classical Tibetan, Balti, Ladakh)

morphosyntax, morphology, syntax

Annotation Model

 

Annotation Models for East Asian languages

annotation scheme / corpus

language

phenomenon

Annotation Model

Penn Chinese Treebank
(Xia 2000)

Chinese

morphosyntax

Annotation Model

SFB632 annotation standard
(Dipper et al. 2008)

Japanese (among other, unrelated languages)
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

 

Annotation Models for Afroasiatic languages

annotation scheme / corpus

language

phenomenon

Annotation Model

Arabic tagset
(Khoja 2001)

Arabic

morphosyntax

Annotation Model

SFB632 annotation standard
(Dipper et al. 2008)

Chadic languages (including Guruntum, Tangale, Hausa)
(SFB 632, project B2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

Hausa Internet Corpus
Chiarcos et al. (2011)

Hausa

morphosyntax

t.b.a

 

Annotation Models for the languages of Subsaharic Africa

annotation scheme / corpus

language

phenomenon

Annotation Model

SFB632 annotation standard
(Dipper et al. 2008)

Gur and Kwa languages (including Aja, Dagbani, Buli, Byali, Ditammari, Fon, Foodo, Konni, Nateni, Waamma, Yom)
(SFB 632, project B1)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

Chadic languages (including Guruntum, Tangale, Hausa)
(SFB 632, project B2)

Hausa Internet Corpus
Chiarcos et al. (2011)

Hausa

morphosyntax

t.b.a

 

Annotation Models for indigenous languages of the Americas, Australia and the Pacific

annotation scheme / corpus

language

phenomenon

Annotation Model

SFB632 annotation standard
(Dipper et al. 2008)

Teribe, Yucatec Maya, Mawng, Niue
(SFB 632, project D2)

parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure)

Annotation Model, Linking Model

 

Annotation Models for discourse annotations

annotation scheme / corpus

language

phenomenon

Annotation Model

ARRAU corpus

English

coreference

t.b.a

CRC 732, A3 annotations of the Stuttgarter Radio News Corpus

German

information status, pronominal coreference

t.b.a

OntoNotes

English

coreference

t.b.a

Penn Discourse Graphbank

English

discourse relations

t.b.a

Penn Discourse Treebank

English

connectives, discourse relations

t.b.a

Potsdam Coreference Scheme

English, German

coreference

t.b.a

RST Discourse Treebank

English

RST discourse relations and discourse segments

t.b.a

 

External Reference Models

terminological repository

original url

local url

Linking Model

ISO TC37/SC4 Data Category Registry

http://www.isocat.org

t.b.a

t.b.a

GOLD

http://linguistics-ontology.org

t.b.a

t.b.a