-
"The Norwegian Colossal Corpus (NCC) is a collection of multiple smaller Norwegian corpuses suitable for training large language models. We have done extensive cleaning on the...
-
Listen indeholder opslagsordene i ODS (og ODS-S) på nettet ordnet.dk/ods samt de bøjningsformer der er registreret til brug for ordbogens søgefunktion. Listen er TAB-separeret...
- HTML
-
Dansk etsproget korpus på 3,708,693 sætninger, med indholdet på www.retsinformation.dk.
-
Contents of https://laegemiddelstyrelsen.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. Contains 22699 translation units between...
- ZIP
-
Contents of https://www.dst.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...
-
Contents of https://natmus.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...
-
Contents of https://www.odense.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework...
-
Contents of https://slks.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of the...
-
Contents of https://spillemyndigheden.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...
-
Contents of https://naturstyrelsen.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...
-
Contents of http://www.geus.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...
-
Contents of http://rigsrevisionen.dk/ website downloaded, aligned and converted into parallel corpus This dataset has been created within the framework of the European Language...
-
Contents of https://www.dma.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...
-
Contents of https://uk.fm.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...
-
Contents of http://um.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of the...
-
Contents of http://www.aarhus2017.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...
-
Contents of https://www.visitvejle.com were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...
-
Maskinlæsbar version af dumps fra den danske wikipedia kilder. Se https://foundation.wikimedia.org/wiki/Terms_of_Use
-
Maskinlæsbar version af dumps fra den danske wikipedias citater. Se https://foundation.wikimedia.org/wiki/Terms_of_Use
-
Maskinlæsbar version af dumps fra den danske wikipedia. Se https://foundation.wikimedia.org/wiki/Terms_of_Use