-
RØST-315M is a speech recognition model based on the CoRal-dataset, and the model is a product of the CoRal-project. CoRal is a project that aims to produce datasets that are...
- Safetensors
-
CoRal is a comprehensive Automatic Speech Recognition (ASR) dataset designed to capture the diversity of the Danish language across various dialects, accents, genders, and age...
- Parquet
-
Maskinlæsbar version af dumps fra den danske wikipedia kilder. Se https://foundation.wikimedia.org/wiki/Terms_of_Use
- XML
-
"The Norwegian Colossal Corpus (NCC) is a collection of multiple smaller Norwegian corpuses suitable for training large language models. We have done extensive cleaning on the...
- JSON
-
ordbog (et Frame-leksikon) med verbers og verbalsubstantivers semantiske rammer (Frames) ud fra standarden Berkeley FrameNet https://framenet.icsi.berkeley.edu/fndrupal/ (der...
- CSV
-
DanNet is a Danish lexical semantic wordnet; i.e. a language resource where the semantic relations between words are expressed in a formal language and thereby made usable for...
- CSV
- OWL
-
The Digital Corpus of the European Parliament (DCEP) contains the majority of the documents published on the European Parliament's official website. It comprises a variety of...
- XML
- SGML
- TXT
-
Contents of https://laegemiddelstyrelsen.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. Contains 22699 translation units between...
- TMX
-
Contents of https://www.vikingeskibsmuseet.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. Contains 12403 translation units (EN-...
- TMX
-
Contents of https://eng.mst.dk/ and https://mst.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created...
- TMX
-
Contents of https://www.dst.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...
- TMX
-
Contents of https://natmus.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...
- TMX
-
Contents of https://www.odense.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework...
- TMX
-
Contents of https://slks.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of the...
- TMX
-
Contents of https://spillemyndigheden.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...
- TMX
-
Contents of https://naturstyrelsen.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...
- TMX
-
Contents of http://www.geus.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...
- TMX
-
Contents of http://rigsrevisionen.dk/ website downloaded, aligned and converted into parallel corpus This dataset has been created within the framework of the European Language...
- TMX
-
Contents of https://www.visitdenmark.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...
- TMX
-
Contents of http://www.aarhus2017.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...
- TMX