Ressourcer - sprogteknologi.dk

Dansk ELECTRA

ELECTRA model prætrænet på dansk, på 17,5 GB data. Du kan læse mere om ELECTRA træningsmetoden i denne forskningsartikel: ELECTRA: Pre-training Text Encoders as Discriminators...

BIN

KlimaBERT

KlimaBERT er et værktøj, som kan identificere og analysere politiske citater, der er relaterede til klima. Modellen fungerer bedst ved brug af officielle tekster fra...

BIN

RøBÆRTa

RøBÆRTa er en dansk præ-trænet Roberta sprogmodel. RøBÆRTa er blevet trænet på det danske mC4 datasæt i forbindelse med flax community week. Modellen er trænet til at gætte et...

BIN

Dansk ConvBERT

To forskellige størrelser ConvBERT modeller prætrænet på dansk tekstdata (omtrent 17,5 GB data). Til prætræning er der anvendt ELECTRA Pretraining metoden. ConvBERT er en...

BIN

Bilingual English-Danish parallel corpus from The Viking Ship Museum website

Contents of https://www.vikingeskibsmuseet.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. Contains 12403 translation units (EN-...

TMX

Danish Legal monolingual corpus from the contents of the retsinformation.dk web site

Danish Legal monolingual corpus from the contents of the retsinformation.dk web site This dataset has been created within the framework of the European Language Resource...

TXT

COVID-19 EC-EUROPA v1 dataset. Bilingual (EN-DA)

Bilingual (EN-DA) corpus acquired from website (https://ec.europa.eu/*coronavirus-response) of the EU portal (20th May 2020). Contains 2803 translation units (DA-EN).

TMX

Covid-19 EUR-LEX dataset. Bilingual (EN-DA)

Bilingual (EN-DA) corpus acquired from website (https://eur-lex.europa.eu/legal-content) of the EU portal (9th July 2020). Contains 21238 translations units (DA-EN)

TMX

COVID-19 EUROPARL dataset v2. Bilingual (EN-DA)

Bilingual (EN-DA) corpus acquired from the website (https://www.europarl.europa.eu/) of the European Parliament (9th May 2020). Contains 633 translation units (DA-EN).

TMX

COVID-19 ANTIBIOTIC dataset. Bilingual (EN-DA)

This dataset has been generated out of public content available through the portal (https://antibiotic.ecdc.europa.eu/) of the European Centre for Disease Prevention and Control...

TMX

COVID-19 EU presscorner v2 dataset. Bilingual (EN-DA)

Bilingual (EN-DA) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020). Contains 6261 translation units (DA-EN).

TMX

Bilingual corpus made out of PDF documents from the European Medicines...

EN-DA Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020). Attribution details: This dataset has...

TMX

Compilation of Danish-English parallel corpora resources used for training...

Dette tosproget korpora er bygget af en række forskellige korpusser fra udvalgte offentlige og private korpus og er blevet brugt til at træne NTEU (Neural Translation for the...

TMX

Bilingual English-Danish parallel corpus from the official Nordic cooperation website

Contents of the Nordic Co-operation web site http://www.norden.org downloaded and converted into a parallel corpus This dataset has been created within the framework of the...

TMX

Bilingual English-Danish parallel corpus from The Danish Environmental...

Contents of https://eng.mst.dk/ and https://mst.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created...

TMX

Bilingual English-Danish parallel corpus from Danmarks Statistik website

Contents of https://www.dst.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...

TMX

Bilingual English-Danish parallel corpus from National Museum of Denmark website

Contents of https://natmus.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of...

TMX

Bilingual English-Danish parallel corpus from Odense Municipality website

Contents of https://www.odense.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework...

TMX

Bilingual English-Danish parallel corpus from The Agency for Culture and...

Contents of https://slks.dk were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the framework of the...

TMX

Bilingual English-Danish parallel corpus from The Danish Gambling Authority website

Contents of https://spillemyndigheden.dk/ were crawled, aligned on document and sentence level and converted into a parallel corpus. This dataset has been created within the...

TMX

200 ressourcer fundet