62 ressourcer fundet

Tags: Tekst

Filtrér resultater
  • DaNE adds NER annotations to the The Danish Universal Dependencies Treebank (UD-DDT). The Danish UD treebank (Johannsen et al., 2015, UD-DDT) is a conversion of the Danish...
    • ZIP
  • DanNet is a Danish lexical semantic wordnet; i.e. a language resource where the semantic relations between words are expressed in a formal language and thereby made usable for...
    • CSV
    • OWL
  • The aligned corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 and 2011 (http://europa.eu/rapid/search.htm). The...
  • En opmærket multimodal samling af samtaler på dansk hvor tolv deltagerpar taler sammen for at lære hinanden at kende. Deltagerne blev filmet mens de stod foran hinanden og talte...
  • Gruntvig's Works version 1,12. april 2018 contains N.F.S. Grundtvig's authorship. Corpus folder containing edited texts and OCR texts. Creator: Ravn, Kim Steen License:...
  • Tekster fra Arkiv for Dansk Litteratur (ADL). Ældre dansk litteratur. Licens: https://github.com/Det-Kongelige-Bibliotek/access-digital-objects/blob/master/LICENSE
  • The SemDax Corpus is a Danish human-annotated corpus relying on the combined wordnet and dictionary resources: DanNet and Den Danske Ordbog, and available through a CLARIN...
  • Danish multi-task CNN trained on UD Danish DDT and DaNE. Assigns context-specific token vectors, POS tags, dependency parses and named entities. Sources: Danish Universal...
  • Digitalisering og opmærkning af trusselsbreve til projektet 'Truslers sprog og genre', der bygger på en innovativ kombination af sprogvidenskab og genrestudier med det formål at...
  • Elektroniske versioner af størstedelen af Johannes V. Jensens udgivelser. I regi af CLARIN-projektet og i samarbejde med rettighedshaverne, gjorde Jensen Forum i...
  • Binary wordlists for the CST lemmatizer as suplement to the rules of the lemmatizer. Works with both tagged and untagged input. Use: cstlemma -d NAME-OF-WORDLIST License:...
  • The Danish similarity dataset is a gold standard resource for evaluation of Danish word embedding models. The dataset consists of 99 word pairs rated by 38 human judges...
  • DK-CLARIN Reference Corpus of General Danish has been collected as part of DK-CLARIN project, WP2.1, 2008 - 2011. All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with...
  • DUDS Jens Bille’s Ballad Book belongs to a corpus of the oldest Danish ballad tradition. The corpus consists of 9 ballad books handed down from Renaissance ballad collectors...
  • The DK-CLARIN Parallel Financial Corpus comprises 4.3 M Danish and 4.8 M English tokens from translated (parallel) documents, mainly annual reports, of the period 2002-2010 from...
  • The LSP (Language for Special Purposes) corpus consists of texts from seven selected domains. The DK-CLARIN LSP corpus comprises 11 M tokens from the period 2000-2010,...
  • The STO (SprogTeknologisk Ordbase) lexicon is a comprehensive computational lexicon of Danish developed for NLP/HLT applications. The syntax layer of the lexicon, presented here...
  • 135 mio parallelsætninger (1620 sprogpar - 85 sprog) fra Wikipedia. License: The mined data is distributed under the Creative Commons Attribution-ShareAlike license. Please...
  • ePAROLE (udgivet i 2015) er en revideret version af PAROLE-DK. Den adskiller sig fra den ældre version primært ved, at der er brugt et nyudviklet, mere stringent tag-sæt: ePOS....
  • CST's modificerede udgave af BRILL-taggeren POS-tagger i C/C++