Ressourcer - sprogteknologi.dk

DanPASS-korpus (Danish Phonetically Annotated Spontaneous Speech)

The DanPASS corpus was developed for research and applied research purposes. It consists of of non-scripted monologues and dialogues, recorded by 27 speakers, comprising a total...

BIN
TXT

CDT - The Copenhagen Danish-English Dependency Treebank

The Copenhagen Dependency Treebanks are a set of treebanks for Danish, English, Spanish and Italian. The purpose of the Copenhagen Dependency Treebank project is to create...

TAG
ATAG

Bornholmersnak

Udtale af ord med bornholmsk dialekt. BCP-47: da-DK-bornholm.

HTML

Medical spelling dictionary (processed)

Medical spelling dictionary with terms in Danish, English and Latin This dataset has been created within the framework of the European Language Resource Coordination (ELRC)...

TBX

Danish BERT

BERT (Bidirectional Encoder Representations from Transformers) is a deep neural network model used in Natural Language Processing. The network learns the grammar and semantics...

CKPT

EUIPO - Trade mark Guidelines (October 2017) (English-Danish) (Processed)

The EUIPO Guidelines are the main point of reference for users of the European Union trade mark system and professional advisers who want to make sure they have the latest...

TMX

ScandiNER

ScandiNER er en NER (named entity recognition) model, som er bygget på den norske model fra det norske nationalbiblioteks AI labbet. Modellen er fin tunet på et kombineret...

BIN

DCEP: Digitalt korpus fra Europa-Parlamentet

The Digital Corpus of the European Parliament (DCEP) contains the majority of the documents published on the European Parliament's official website. It comprises a variety of...

XML
SGML
TXT

CST STO

The STO (SprogTeknologisk Ordbase) lexicon is a comprehensive computational lexicon of Danish developed for NLP/HLT applications. The syntax layer of the lexicon, presented here...

LMF
CSV

CST Lemmatiser

CST's lemmatiser fører hvert ord i en tekst tilbage til grundformen, lemmaet.

C/C++

NST Pronunciation Lexicon for Danish

Stammer fra NST (Nordisk Språkteknologi) som gik konkurs i 2003. Er holdt ajour i den norske sprogbank i Nationalbiblioteket.

TXT

spaCY - statistiske modeller for dansk

Danish multi-task CNN trained on UD Danish DDT and DaNE. Assigns context-specific token vectors, POS tags, dependency parses and named entities. Sources: Danish Universal...

Python
Source code

Hisia

ML Powered Danish Sentiment Model

Python
Source code

Danish Universal Dependencies DDT (UD_Danish-DDT)

The Danish Universal Dependencies treebank (Johannsen et al., 2015, UD-DDT) is a conversion of the Danish Dependency Treebank (Buch-Kromann et al. 2003) based on texts from...

coNLL-U

Bilingual English-Danish parallel corpus from The Danish Medicines Agency

Contents of https://laegemiddelstyrelsen.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. Contains 22699 translation units between...

TMX

JEX - JRC EuroVoc Indexer

JEX is multi-label classification software that automatically assigns a ranked list of the over six thousand descriptors (classes) from the controlled vocabulary of the EuroVoc...

Java

Bilingual English-Danish parallel corpus from Denmark National Space...

Contents of https://www.vikingeskibsmuseet.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. Contains 1939 translation units (EN-DA)....

TMX

Senda

Et værktøj til at fine-tune til NLP Transformers til sentiment analyse. Udkommer sammen med et sæt modeller fine-tunet til sentiment-analyse på dansk (udgivet på Hugging Face)....

Python

NERDA

NERDA' er et værktøj (udgivet som Python-pakke) til at fine-tune NLP transformer-modeller til at identificere personer, organisationer, lokationer m.m. i tekster (=Named-Entity...

Python

DaCy

DaCy er et framework til processering af dansk fritekst. Særligt indeholder den tre sprogprocesserings pipelines til dansk fritekst. De pipeline er af 3 forskellige størelser...

Python

200 ressourcer fundet