DanNet is a Danish lexical semantic wordnet; i.e. a language resource where the semantic relations between words are expressed in a formal language and thereby made usable for... -
CST's modificerede udgave af BRILL-taggeren POS-tagger i C/C++. -
NOMCO corpus
En opmærket multimodal samling af samtaler på dansk hvor tolv deltagerpar taler sammen for at lære hinanden at kende. Deltagerne blev filmet mens de stod foran hinanden og talte... -
DK-CLARIN Parallel Financial Corpus (da-en)
The DK-CLARIN Parallel Financial Corpus comprises 4.3 M Danish and 4.8 M English tokens from translated (parallel) documents, mainly annual reports, of the period 2002-2010 from... -
CST Lemmatiser
CST's lemmatiser fører hvert ord i en tekst tilbage til grundformen, lemmaet. -
The LSP (Language for Special Purposes) corpus consists of texts from seven selected domains. The DK-CLARIN LSP corpus comprises 11 M tokens from the period 2000-2010,... -
Danish Similarity Data Set
The Danish similarity dataset is a gold standard resource for evaluation of Danish word embedding models. The dataset consists of 99 word pairs rated by 38 human judges... -
DK-CLARIN Rapid Aligned Corpus 1993-2011 (da-en, da-de)
The aligned corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 and 2011 (http://europa.eu/rapid/search.htm). The... -
The SemDax Corpus is a Danish human-annotated corpus relying on the combined wordnet and dictionary resources: DanNet and Den Danske Ordbog, and available through a CLARIN... -
Dictionary for the CST Lemmatizer
Binary wordlists for the CST lemmatizer as suplement to the rules of the lemmatizer. Works with both tagged and untagged input. Use: cstlemma -d NAME-OF-WORDLIST. -
CST's tokeniserings- og segmenteringsprogram
CST's tokeniserings- og segmenteringsprogram til tekst- og RTF-filer. Opdeler en tekst i ord og ordforbindelser -
The STO (SprogTeknologisk Ordbase) lexicon is a comprehensive computational lexicon of Danish developed for NLP/HLT applications. The syntax layer of the lexicon, presented here... -
CST Mulinco
MULINCO - MUltiLINgual Corpus of the University of COpenhagen. 7 eventyr af H.C.Andersen, tekster af Edgar Allen Poe, Saxos Danmarks historie og EU-traktater på flere sprog...