This page enumerates the ontologies that are currently available. Officially, none of them has been released. They will be released under a Creative Commons Attribution Sharealike licence as soon as a reference publication has appeared. Until then, feel free to make use of them, but it would be nice to be notified if this happens. As a reference, see the ontology-relevant publications under http://www.sfb632.uni-potsdam.de/~chiarcos, and some remarks on the background of the OLiA ontologies. Besides the ontologies listed below, there are a number of experimental ontologies, e.g., concerning further annotation schemes, the linking with GOLD and the ISO TC37/SC4 Data Category Registry, and additional phenomena (discourse, coreference). Please contact Christian Chiarcos (FIRSTNAME_IN_LOWERCASE [DOT] LASTNAME_IN_LOWERCASE [AT] web [DOT] de) if you’re interested in these.
The OLiA architecture is a set of modular OWL/DL ontologies with ontological models of annotation schemes (Annotation Models) on the one hand, an ontology of reference terms (Reference Model) on the other hand, and ontologies (Linking Models) that implement subClassOf relationships between them.
Some remarks on viewing and browsing the ontologies: For browsing the OLiA ontologies, I recommend
|
Module |
phenomenon |
OWL/DL models |
|
OLiA Reference Model for morphosyntax, morphology and syntax |
morphosyntax, morphology and syntax |
|
|
OLiA Reference Model for discourse structure |
discourse structure, discourse relations |
t.b.a |
|
OLiA Reference Model for information structure |
information structure, information status, coreference |
t.b.a |
|
OLiA System Ontology |
basic annotation data structures |
|
|
OLiA Top-Level Ontology |
top-level concepts of the OLiA Reference Model for morphosyntax, morphology and syntax |
|
tagset / NLP tool |
phenomenon |
languages |
OWL/DL models |
|
SFB632 annotation standard (Dipper et al. 2008) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) |
> 30 typologically different languages, including many African languages |
|
EAGLES recommendations |
morphosyntax |
11 EU languages, incl. Romance, Germanic, Greek and Irish |
|
|
Connexor dependency parser |
morphosyntax, morphology, dependency syntax |
10 European languages, incl. Romance, Germanic and Uralic languages |
|
|
MULTEXT-East |
morphosyntax, morphology |
15 mostly Eastern European languages, incl. Slavic, Romance, Uralic languages and Persian |
Annotation Model (common specifications), Linking Model; Annotation Model (all languages), see project page and below for individual languages |
IL-POSTS tagset |
morphosyntax |
languages of the Indian subcontinent |
|
AnnCorra |
morphosyntax, chunks |
languages of the Indian subcontinent |
|
IIIT tagset |
morphosyntax |
languages of the Indian subcontinent |
|
tagset / NLP tool |
phenomenon |
OWL/DL models |
|
Brown corpus tagset |
morphosyntax |
|
|
Connexor dependency parser |
morphosyntax, morphology, dependency syntax |
|
EAGLES recommendations (English) |
morphosyntax |
|
|
GENIA corpus |
morphosyntax |
|
|
MULTEXT-East (English) |
morphosyntax |
|
|
Penn Treebank |
morphosyntax |
|
|
|
syntax |
|
|
QTag |
morphosyntax |
|
|
Stanford dependency parser |
dependency syntac |
|
|
Susanne corpus |
morphosyntax |
|
tagset / NLP tool |
phenomenon |
OWL/DL models |
|
Connexor dependency parser |
morphosyntax, morphology, dependency syntax |
|
EAGLES recommendations (German) |
morphosyntax |
|
|
Morphisto |
morphology |
|
|
STTS |
morphosyntax |
|
|
TIGER/NEGRA |
morphology |
|
|
|
constituent syntax |
|
|
TreeTagger Chunker |
chunk labels |
|
|
RFTagger |
morphosyntax, morphology |
t.b.a |
|
tagset/NLP tool |
language |
phenomenon |
OWL/DL models |
EAGLES recommendations |
Danish, Dutch, Swedish (and several non-Germanic languages) |
morphosyntax; inflectional morphology |
|
|
Connexor |
Dutch, Swedish, Danish, Norwegian |
morphosyntax, morphology, dependency syntax |
|
|
SFB632 annotation standard |
Dutch (among other languages) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) |
|
|
MENOTA (incomplete) |
Old Norse |
morphosyntax |
|
|
T-CODEX |
Old High German |
morphosyntax, syntax, information structure |
|
tagset / NLP tool |
phenomenon |
OWL/DL models |
|
Uppsala corpus tagset |
morphosyntax, morphology |
|
|
Russian TreeTagger |
morphosyntax |
|
|
MULTEXT-East for Russian |
morphosyntax, morphology |
|
tagset / NLP tool |
language |
OWL/DL models |
MULTEXT-East |
Bulgarian |
|
|
Czech |
|
|
Macedonian |
|
|
Polish |
|
|
Slovak |
|
|
Slovene |
|
|
Resian (Slovene spoken in Italy) |
|
|
Serbian |
|
|
Ukrainian |
|
tagset / NLP tool |
phenomenon |
OWL/DL models |
EAGLES recommendations |
morphosyntax |
|
|
French TreeTagger |
morphosyntax |
|
|
Le Monde corpus |
morphosyntax |
|
|
Connexor |
morphosyntax, morphology, dependency syntax |
|
|
SFB632 annotation standard |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) for Canadian French (among other languages, SFB 632, project D2) |
|
tagset |
language |
phenomenon |
OWL/DL models |
EAGLES recommendations |
Catalan, Portuguese, Spanish |
morphosyntax |
|
|
Connexor |
Spanish, Italian |
morphosyntax, morphology, dependency syntax |
|
|
PAROLE Spanish/Catalan |
Spanish, Catalan |
morphosyntax, inflectional morphology |
|
|
MULTEXT-East |
Romanian |
morphosyntax, morphology |
|
tagset |
language |
phenomenon |
OWL/DL models |
|
Connexor |
Finnish |
morphosyntax, morphology, dependency syntax |
|
|
MULTEXT-East |
Estonian |
morphosyntax, morphology |
|
|
|
Hungarian |
morphosyntax, morphology |
|
|
SFB632 annotation standard |
Hungarian (among other languages) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) |
|
Turkish POS tagset |
Turkish |
morphosyntax |
|
tagset |
language |
phenomenon |
OWL/DL models |
EAGLES recommendations |
Greek, Irish (among other EU languages) |
morphosyntax |
|
|
SFB632 annotation standard |
Georgian, Greek (among other languages) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) |
|
EUSTagger |
Basque |
morphosyntax |
|
tagset |
language |
phenomenon |
OWL/DL models |
Urdu EMILLE tagset |
Urdu |
morphosyntax, inflectional morphology |
|
Urdu tagset |
Urdu |
morphosyntax |
|
IL-POSTS tagset |
Bangla, Hindi, Marathi, Sanskrit |
morphosyntax, inflectional morphology |
|
AnnCorra |
Bangla, Hindi |
morphosyntax, chunks |
|
IIIT tagset |
Hindi, Marathi |
morphosyntax |
|
|
SFB632 annotation standard |
Konkani (among other, unrelated languages) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) |
|
|
MULTEXT-East |
Farsi (Persian) |
morphosyntax |
|
tagset |
language |
phenomenon |
OWL/DL models |
IL-POSTS tagset |
Kannada, Malayalam, Tamil, Telugu |
morphosyntax |
|
AnnCorra |
Telugu, Tamil |
morphosyntax, chunks |
|
IIIT tagset |
Telugu |
morphosyntax |
|
tagset |
language |
phenomenon |
OWL/DL models |
Dzongkha tagset |
Dzongkha |
morphosyntax |
|
|
SFB632 annotation standard |
Prinmi (among other, unrelated languages) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) |
|
|
Tübingen Tibetan Corpora |
Tibetan (Old Tibetan, Classical Tibetan, Balti, Ladakh) |
morphosyntax, morphology, syntax |
|
annotation scheme / corpus |
language |
phenomenon |
Annotation Model |
Penn Chinese Treebank |
Chinese |
morphosyntax |
|
|
SFB632 annotation standard |
Japanese (among other, unrelated languages) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) |
|
annotation scheme / corpus |
language |
phenomenon |
Annotation Model |
Arabic tagset |
Arabic |
morphosyntax |
|
SFB632 annotation standard |
Chadic languages (including Guruntum, Tangale, Hausa) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) |
|
Hausa Internet Corpus |
Hausa |
morphosyntax |
t.b.a |
|
annotation scheme / corpus |
language |
phenomenon |
Annotation Model |
|
SFB632 annotation standard |
Gur and Kwa languages (including Aja, Dagbani, Buli, Byali, Ditammari, Fon, Foodo, Konni, Nateni, Waamma, Yom) |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) |
|
|
Chadic languages (including Guruntum, Tangale, Hausa) |
|||
Hausa Internet Corpus |
Hausa |
morphosyntax |
t.b.a |
|
annotation scheme / corpus |
language |
phenomenon |
Annotation Model |
|
SFB632 annotation standard |
Teribe, Yucatec Maya, Mawng, Niue |
parts of speech, glosses, chunk labels, grammatical functions (phonology, information structure) |
|
annotation scheme / corpus |
language |
phenomenon |
Annotation Model |
|
ARRAU corpus |
English |
coreference |
t.b.a |
|
CRC 732, A3 annotations of the Stuttgarter Radio News Corpus |
German |
information status, pronominal coreference |
t.b.a |
|
OntoNotes |
English |
coreference |
t.b.a |
|
Penn Discourse Graphbank |
English |
discourse relations |
t.b.a |
|
Penn Discourse Treebank |
English |
connectives, discourse relations |
t.b.a |
|
Potsdam Coreference Scheme |
English, German |
coreference |
t.b.a |
|
RST Discourse Treebank |
English |
RST discourse relations and discourse segments |
t.b.a |
|
terminological repository |
original url |
local url |
Linking Model |
|
ISO TC37/SC4 Data Category Registry |
t.b.a |
t.b.a |
|
|
GOLD |
t.b.a |
t.b.a |