Tag: linguistics

There are 20 datasets tagged with linguistics:

  • Language Commons
    • None Not Openly Licensed
  • This database contains more than 3,000 notices on major linguistic books on grammar, from Antiquity to now. Major books will progressively be digitized and made available through the...
  • About About TalkBank: The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of the...
  • MOCHA-TIMIT
    • None Not Openly Licensed
    About Authors: Alan Wrench, Queen Margaret University College. Funded by: Engineering and Physical Sciences Research Council. When created: November 1999....
  • The SPECIALIST Lexicon
    • None Not Openly Licensed
    The SPECIALIST lexicon is a large syntactic lexicon of biomedical and general English. Coverage includes both commonly occurring English words and biomedical vocabulary. The lexicon entry...
  • The Speech Accent Archive
    • None Not Openly Licensed
    From website: The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read the same...
  • A lexical database documenting translations among lexemes of language varieties.
  • WordNet-like concept network developed at MIT ConceptNet aims to give computers access to common-sense knowledge, the kind of information that ordinary people know but usually leave...
  • The IBL Corpus
    • None Not Openly Licensed
    About The IBL Corpus was collected by the University of Plymouth and the University of Edinburgh as part of the EPSRC funded project IBL, Instruction-based Learning for Mobile Robots...
  • DBpedia Spotlight is a tool for annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through...
  • About From website: WordNet is a lexical database for English that has been widely adopted in artificial intelligence and computational linguistics for a variety of practical...
  • The French TimeBank consists of a set of 109 journalistic articles from 7 different sub-genres annotated according to the ISO-TimeML standard, adapted for the French language....
  • GeoWordNet is a semantic resource built from the full integration of WordNet, GeoNames and the Italian part of MultiWordNet. GeoWordNet Public Dataset contains 3,698,238 entities,...
  • This is a recipe to train word n-gram language models using the newswire text provided in the English Gigaword corpus (1200M words of NYT, APW, AFE, XIE). It also prepares dictionaries...
  • Resources, including corpora and software, for processing Hungarian language. Language resources The Hunglish Corpus is a sentence-aligned Hungarian-English parallel corpus...
  • RDF conversion of Princeton's package:wordnet, version 3.0. With many links to package:w3c-wordnet, package:lexvo and the Dutch package:cornetto.
  • Description Overview from home page: The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic...
  • Dutch lexical database, similar to WordNet but with more semantic relations. Links to package:vu-wordnet and package:w3c-wordnet. When this dataset is used for research purposes,...