|
NLI |
Antonyms, quantities, spelling, word overlap, negation, length |
English |
7596 |
Automatic |
|
|
NLI |
Compositionality |
English |
44010 |
Automatic |
|
|
NLI |
Antonyms, hyper/hyponyms |
English |
6279 |
Semi-automatic |
|
|
NLI |
Diverse semantics |
English |
550 |
Manual |
|
|
NLI |
Lexical inference |
English |
8193 |
Semi-automatic |
|
|
NLI |
Diverse |
English |
570K |
Manual, semi-automatic, automatic |
|
|
MT |
Word sense disambiguation |
German→English/French |
13900 |
Semi-automatic |
|
|
MT |
Morphology |
English→Czech/Latvian |
18500 |
Automatic |
|
|
MT |
Polarity, verb-particle constructions, agreement, transliteration |
English→German |
97000 |
Automatic |
|
|
MT |
Discourse |
English→French |
400 |
Manual |
|
|
MT |
Morpho-syntax, syntax, lexicon |
English↔French |
108+506 |
Manual |
|
|
MT |
Diverse |
English↔German |
10000 |
Manual |
|
|
MT |
Discourse |
English→German |
4627 |
Automatic |
Test sets created using oracles, an alternative to challenge sets. The method can be applied to different language pairs and datasets. |
|
MT |
Coreference, pronouns |
English→German |
12000 |
Automatic |
|
|
LM |
Subject-verb agreement |
English |
∼1.35M |
Automatic |
|
|
LM |
Number agreement |
English, Russian, Hebrew, Italian |
∼10K |
Automatic |
|
|
Coreference |
Gender bias |
English |
720 |
Semi-automatic |
|
|
Coreference |
Gender bias |
English |
3160 |
Semi-automatic |
|
|
Seq2Seq |
Compositionality |
English |
20910 |
Automatic |
|
|
POS tagging |
Noun-verb ambiguity |
English |
32654 |
Semi-automatic |
|
|
NLI |
Psychometric assessment |
English |
180 |
Manual |
|
|
Sentiment |
Psychometric assessment |
English |
134 |
Manual |
|