Ressourcer - sprogteknologi.dk

Røst-315M

RØST-315M is a speech recognition model based on the CoRal-dataset, and the model is a product of the CoRal-project. CoRal is a project that aims to produce datasets that are...

Safetensors

CoRal - Danish Conversational and Read-aloud Dataset

CoRal is a comprehensive Automatic Speech Recognition (ASR) dataset designed to capture the diversity of the Danish language across various dialects, accents, genders, and age...

Parquet

Dansk Wikisource

Maskinlæsbar version af dumps fra den danske wikipedia kilder. Se https://foundation.wikimedia.org/wiki/Terms_of_Use

XML

The Norwegian Colossal Corpus

"The Norwegian Colossal Corpus (NCC) is a collection of multiple smaller Norwegian corpuses suitable for training large language models. We have done extensive cleaning on the...

JSON

Framenet DK

ordbog (et Frame-leksikon) med verbers og verbalsubstantivers semantiske rammer (Frames) ud fra standarden Berkeley FrameNet https://framenet.icsi.berkeley.edu/fndrupal/ (der...

CSV

DanNet

DanNet is a Danish lexical semantic wordnet; i.e. a language resource where the semantic relations between words are expressed in a formal language and thereby made usable for...

CSV
OWL

DCEP: Digitalt korpus fra Europa-Parlamentet

The Digital Corpus of the European Parliament (DCEP) contains the majority of the documents published on the European Parliament's official website. It comprises a variety of...

XML
SGML
TXT

Bilingual English-Danish parallel corpus from The Danish Medicines Agency

Contents of https://laegemiddelstyrelsen.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. Contains 22699 translation units between...

TMX

Bilingual English-Danish parallel corpus from The Viking Ship Museum website

Contents of https://www.vikingeskibsmuseet.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. Contains 12403 translation units (EN-...

TMX

Bilingual English-Danish parallel corpus from The Danish Environmental...

Contents of https://eng.mst.dk/ and https://mst.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created...

TMX

Bilingual English-Danish parallel corpus from Danmarks Statistik website

Contents of https://www.dst.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...

TMX

Bilingual English-Danish parallel corpus from National Museum of Denmark website

Contents of https://natmus.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...

TMX

Bilingual English-Danish parallel corpus from Odense Municipality website

Contents of https://www.odense.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework...

TMX

Bilingual English-Danish parallel corpus from The Agency for Culture and...

Contents of https://slks.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of the...

TMX

Bilingual English-Danish parallel corpus from The Danish Gambling Authority website

Contents of https://spillemyndigheden.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...

TMX

Bilingual English-Danish parallel corpus from The Danish Nature Agency website

Contents of https://naturstyrelsen.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...

TMX

Bilingual English-Danish parallel corpus from The Geological Survey of...

Contents of http://www.geus.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...

TMX

Bilingual Danish-English parallel corpus from the State Audit Office...

Contents of http://rigsrevisionen.dk/ website downloaded, aligned and converted into parallel corpus This dataset has been created within the framework of the European Language...

TMX

Bilingual English-Danish parallel corpus from VisitDenmark - The official...

Contents of https://www.visitdenmark.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...

TMX

Bilingual English-Danish parallel corpus from Aarhus 2017 - European Capital...

Contents of http://www.aarhus2017.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...

TMX

25 ressourcer fundet