There are 20 datasets tagged with linguistics:
-
-
-
This database contains more than 3,000 notices on major linguistic books on grammar, from Antiquity to now. Major books will progressively be digitized and made available through the...
-
-
About About TalkBank: The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of the...
-
About Authors: Alan Wrench, Queen Margaret University College. Funded by: Engineering and Physical Sciences Research Council. When created: November 1999....
-
The SPECIALIST lexicon is a large syntactic lexicon of biomedical and general English. Coverage includes both commonly occurring English words and biomedical vocabulary. The lexicon entry...
-
From website: The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read the same...
-
A lexical database documenting translations among lexemes of language varieties.
-
WordNet-like concept network developed at MIT ConceptNet aims to give computers access to common-sense knowledge, the kind of information that ordinary people know but usually leave...
-
About The IBL Corpus was collected by the University of Plymouth and the University of Edinburgh as part of the EPSRC funded project IBL, Instruction-based Learning for Mobile Robots...
-
DBpedia Spotlight is a tool for annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through...
-
About From website: WordNet is a lexical database for English that has been widely adopted in artificial intelligence and computational linguistics for a variety of practical...
-
The French TimeBank consists of a set of 109 journalistic articles from 7 different sub-genres annotated according to the ISO-TimeML standard, adapted for the French language....
-
GeoWordNet is a semantic resource built from the full integration of WordNet, GeoNames and the Italian part of MultiWordNet. GeoWordNet Public Dataset contains 3,698,238 entities,...
-
This is a recipe to train word n-gram language models using the newswire text provided in the English Gigaword corpus (1200M words of NYT, APW, AFE, XIE). It also prepares dictionaries...
-
Resources, including corpora and software, for processing Hungarian language. Language resources The Hunglish Corpus is a sentence-aligned Hungarian-English parallel corpus...
-
RDF conversion of Princeton's package:wordnet, version 3.0. With many links to package:w3c-wordnet, package:lexvo and the Dutch package:cornetto.
-
Description Overview from home page: The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic...
-
Dutch lexical database, similar to WordNet but with more semantic relations. Links to package:vu-wordnet and package:w3c-wordnet. When this dataset is used for research purposes,...