-
"The Norwegian Colossal Corpus (NCC) is a collection of multiple smaller Norwegian corpuses suitable for training large language models. We have done extensive cleaning on the...
-
En liste bestående af alle opslagsord (lemmaer) fra Den Danske Ordbog (DDO). Listen er TAB-separeret og rummer fire felter: opslagsord, evt. homogranummer, ordklasse, artiklens...
- none
-
Listen indeholder opslagsordene i ODS (og ODS-S) på nettet ordnet.dk/ods. Listen er TAB-separeret og rummer fire felter: opslagsform, evt. homografnummer, ordklasse og artiklens...
- HTML
-
En liste som indeholder alle bøjningsformer af opslagsordene i Den Danske Ordbog (DDO). Listen indeholder opslagsordene i DDO på nettet (ordnet.dk/ddo) samt de bøjningsformer,...
- HTML
-
Listen indeholder opslagsordene i ODS (og ODS-S) på nettet ordnet.dk/ods samt de bøjningsformer der er registreret til brug for ordbogens søgefunktion. Listen er TAB-separeret...
- HTML
-
Dansk etsproget korpus på 3,708,693 sætninger, med indholdet på www.retsinformation.dk.
-
En word2vec2 model, som er trænet på omtrent 1300 timers dansk taledata fra podcasts og lydbøger. Modellen er trænet på 16kHz taledata, hvilket også er formatet, der skal...
-
ScandEval er en benchmarking platform for sprogmodeller på dansk, norsk (både bokmål og nynorsk), svensk, islandsk og færøsk. Den indeholder først og fremmest en benchmarking...
-
ScandiNER er en NER (named entity recognition) model, som er bygget på den norske model fra det norske nationalbiblioteks AI labbet. Modellen er fin tunet på et kombineret...
-
Contents of https://eng.mst.dk/ and https://mst.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created...
-
Contents of https://laegemiddelstyrelsen.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. Contains 22699 translation units between...
- ZIP
-
Contents of https://www.dst.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...
-
Contents of https://natmus.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...
-
Contents of https://www.odense.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework...
-
Contents of https://slks.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of the...
-
Contents of https://spillemyndigheden.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...
-
Contents of https://naturstyrelsen.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...
-
Contents of http://www.geus.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...
-
Contents of http://rigsrevisionen.dk/ website downloaded, aligned and converted into parallel corpus This dataset has been created within the framework of the European Language...
-
Contents of https://www.dma.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...