Ressourcer - sprogteknologi.dk

spaCY - statistiske modeller for dansk

Danish multi-task CNN trained on UD Danish DDT and DaNE. Assigns context-specific token vectors, POS tags, dependency parses and named entities. Sources: Danish Universal...

Python
Source code

Hisia

ML Powered Danish Sentiment Model

Python
Source code

Danish Universal Dependencies DDT (UD_Danish-DDT)

The Danish Universal Dependencies treebank (Johannsen et al., 2015, UD-DDT) is a conversion of the Danish Dependency Treebank (Buch-Kromann et al. 2003) based on texts from...

coNLL-U

DanSpeech

Open-source Python-pakke til dansk talegenkendelse (tale-til-tekst). DanSpeech har arbejdet på at udvikle generelle talegenkendelsesmodeller siden 2018. Projektet har levet som...

Python

Bidirectional Long-Short Term Memory tagger

A toolkit for Part-of-Speech tagging and NER in DyNet. It has been tested on Danish, amongst other languages (for the UD POS tags in the UD_Danish-DDT version 1.1 and 2.3)...

Python

DK-CLARIN Rapid Aligned Corpus 1993-2011 (da-en, da-de)

The aligned corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 and 2011 (http://europa.eu/rapid/search.htm). The...

TXT
TMX

DUDS Jens Bille's Ballad Book, v. 1.1

DUDS Jens Bille’s Ballad Book belongs to a corpus of the oldest Danish ballad tradition. The corpus consists of 9 ballad books handed down from Renaissance ballad collectors...

XML

Johannes V Jensen Korpus

Elektroniske versioner af størstedelen af Johannes V. Jensens udgivelser. I regi af CLARIN-projektet og i samarbejde med rettighedshaverne, gjorde Jensen Forum i 2011...

HTML
PDF

Grundtvigs værker

Gruntvig's Works version 1,12. april 2018 contains N.F.S. Grundtvig's authorship. Corpus folder containing edited texts and OCR texts. Creator: Ravn, Kim Steen License:...

XML

Danish Similarity Data Set

The Danish similarity dataset is a gold standard resource for evaluation of Danish word embedding models. The dataset consists of 99 word pairs rated by 38 human judges...

CSV

DK-CLARIN Parallel Financial Corpus (da-en)

The DK-CLARIN Parallel Financial Corpus comprises 4.3 M Danish and 4.8 M English tokens from translated (parallel) documents, mainly annual reports, of the period 2002-2010 from...

XML

DK-CLARIN LSP Corpus

The LSP (Language for Special Purposes) corpus consists of texts from seven selected domains. The DK-CLARIN LSP corpus comprises 11 M tokens from the period 2000-2010,...

XML

CST's tokeniserings- og segmenteringsprogram

CST's tokeniserings- og segmenteringsprogram til tekst- og RTF-filer. Opdeler en tekst i ord og ordforbindelser

HTML

NOMCO corpus

En opmærket multimodal samling af samtaler på dansk hvor tolv deltagerpar taler sammen for at lære hinanden at kende. Deltagerne blev filmet mens de stod foran hinanden og talte...

XML

taggerXML

CST's modificerede udgave af BRILL-taggeren POS-tagger i C/C++

C/C++

Danish Named Entity Recognition data on top of the Danish Universal...

This resource is an annotation of four NER types (PER, ORG, LOC, MISC) on top of the UD_Danish-DDT data. Status: published and freely available since summer 2019 Reference:...

conll

Leipzig Corpora Collection

The Leipzig Corpora Collection provides different tools and data for download, which are protected by copyright. For more details please refer to our terms of usage....

TXT

Dansk Wikisource

Maskinlæsbar version af dumps fra den danske wikipedia kilder. Se https://foundation.wikimedia.org/wiki/Terms_of_Use

XML

Dansk Wikiquote

Maskinlæsbar version af dumps fra den danske wikipedias citater. Se https://foundation.wikimedia.org/wiki/Terms_of_Use

XML

Danish Gigaword

A billion-word corpus of Danish text. Split into many sections, and covering many dimensions of variation (spoken/written, formal/informal, modern/old, rigsdansk/dialect, and so...

TXT

65 ressourcer fundet