High-quality, ethically sourced Off-the-shelf datasets for training, fine-tuning and evaluating AI models.
High accuracy data verified by experts
Coverage in 50+ languages & dialects
Structured, clean & model-ready formats
GDPR-compliant and ethically sourced
| Datasets | Domain | Type | Language | Size | Format | Details |
|---|---|---|---|---|---|---|
|
Albanian (Albania)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV, FLAC format.
|
Finance | Audio | sq | 38 audio hours | WAV, FLAC | Get Dataset |
|
Albanian (Albania)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | sq | 94 audio hours | WAV, MP3 | Get Dataset |
|
Arabic (Bahrain)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, FLAC format.
|
Call center | Audio | ar | 873 audio hours | FLAC | Get Dataset |
|
Arabic (Egypt)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | ar | 277 audio hours | WAV, MP3 | Get Dataset |
|
Arabic (MSA)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | ar | 1242 audio hours | WAV, MP3 | Get Dataset |
|
Arabic (Oman)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, FLAC, MP3 format.
|
Call center | Audio | ar | 125 audio hours | WAV, FLAC, MP3 | Get Dataset |
|
Arabic (Saudi)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, FLAC, MP3 format.
|
Call center | Audio | ar | 265 audio hours | WAV, FLAC, MP3 | Get Dataset |
|
Arabic (UAE)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, FLAC format.
|
Call center | Audio | ar | 500 audio hours | FLAC | Get Dataset |
|
Arabic-Chinese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ar-zh | 1.5M pairs | JSON, TSV | Get Dataset |
|
Arabic-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ar-en | 2.7M pairs | JSON, TSV | Get Dataset |
|
Arabic-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ar-de | 1.5M pairs | JSON, TSV | Get Dataset |
|
Arabic-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ar-it | 1.5M pairs | JSON, TSV | Get Dataset |
|
Arabic-Japanese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ar-ja | 1.5M pairs | JSON, TSV | Get Dataset |
|
Arabic-Korean
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ar-ko | 1.5M pairs | JSON, TSV | Get Dataset |
|
Arabic-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ar-pt | 1.5M pairs | JSON, TSV | Get Dataset |
|
Arabic-Russian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ar-ru | 1.5M pairs | JSON, TSV | Get Dataset |
|
Arabic-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ar-es | 1.5M pairs | JSON, TSV | Get Dataset |
|
Armenian (Armenia)
audio dataset with dual speaker(s), mono channel, 24+ kHz sampling rate, WAV format.
|
Healthcare | Audio | hy | 65 audio hours | WAV | Get Dataset |
|
Azerbaijani (Azerbaijan)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV, MP3 format.
|
Call center | Audio | az | 29 audio hours | WAV, MP3 | Get Dataset |
|
Azerbaijani (Azerbaijan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | az | 98 audio hours | WAV, MP3 | Get Dataset |
|
Balochi (Balochistan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | bal | 158 audio hours | WAV, MP3 | Get Dataset |
|
Bosnian (Bosnia)
audio dataset with dual speaker(s), mono channel, 24+ kHz sampling rate, WAV format.
|
Travel | Audio | bs | 60 audio hours | WAV | Get Dataset |
|
Bulgarian (Bulgaria)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | bg | 130 audio hours | WAV, MP3 | Get Dataset |
|
Bulgarian-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Danish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-da | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-nl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-fi | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-fr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-de | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-hu | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Irish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-ga | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-it | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-pl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-pt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-sk | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-sl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-es | 97.3k pairs | JSON, TSV | Get Dataset |
|
Bulgarian-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | bg-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Burmese (Myanmar)
audio dataset with dual speaker(s), mono channel, 16+ kHz sampling rate, WAV format.
|
Call center | Audio | my | 81 audio hours | WAV | Get Dataset |
|
Burmese (Myanmar)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | my | 122 audio hours | WAV, MP3 | Get Dataset |
|
Cantonese (China)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, FLAC, WAV format.
|
Finance | Audio | yue | 161 audio hours | FLAC, WAV | Get Dataset |
|
Cantonese (China)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, FLAC, WAV format.
|
Call center | Audio | yue | 165 audio hours | FLAC, WAV | Get Dataset |
|
Cantonese (China)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | yue | 116 audio hours | WAV, MP3 | Get Dataset |
|
Catalan (Catalunya)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | ca | 200 audio hours | WAV, MP3 | Get Dataset |
|
Chinese (China)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | zh | 216 audio hours | WAV, MP3 | Get Dataset |
|
Chinese-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | zh-en | 1.6M pairs | JSON, TSV | Get Dataset |
|
Chinese-Korean
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | zh-ko | 3.8M pairs | JSON, TSV | Get Dataset |
|
Croatian (Croatia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | hr | 80 audio hours | WAV, MP3 | Get Dataset |
|
Croatian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hr-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Croatian-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hr-hu | 6.0M pairs | JSON, TSV | Get Dataset |
|
Croatian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hr-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Croatian-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hr-pl | 5.9M pairs | JSON, TSV | Get Dataset |
|
Croatian-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hr-ro | 5.9M pairs | JSON, TSV | Get Dataset |
|
Croatian-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hr-sk | 5.9M pairs | JSON, TSV | Get Dataset |
|
Croatian-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hr-sl | 5.9M pairs | JSON, TSV | Get Dataset |
|
Czech (Czech Republic)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV format.
|
Call center | Audio | cs | 78 audio hours | WAV | Get Dataset |
|
Czech (Czech Republic)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | cs | 103 audio hours | WAV, MP3 | Get Dataset |
|
Czech-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | cs-hr | 6.0M pairs | JSON, TSV | Get Dataset |
|
Czech-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | cs-en | 5.9M pairs | JSON, TSV | Get Dataset |
|
Czech-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | cs-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Czech-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | cs-hu | 5.9M pairs | JSON, TSV | Get Dataset |
|
Czech-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | cs-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Czech-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | cs-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Czech-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | cs-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Czech-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | cs-pl | 5.9M pairs | JSON, TSV | Get Dataset |
|
Czech-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | cs-ro | 5.9M pairs | JSON, TSV | Get Dataset |
|
Czech-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | cs-sk | 5.9M pairs | JSON, TSV | Get Dataset |
|
Czech-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | cs-sl | 6.0M pairs | JSON, TSV | Get Dataset |
|
Czech-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | cs-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish (Denmark)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV format.
|
Call center | Audio | da | 110 audio hours | WAV | Get Dataset |
|
Danish (Denmark)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | da | 150 audio hours | WAV, MP3 | Get Dataset |
|
Danish-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-nl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-en | 101.8k pairs | JSON, TSV | Get Dataset |
|
Danish-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-fi | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-fr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-de | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-hu | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-it | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Danish-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-pl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-pt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-sk | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-sl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-es | 97.3k pairs | JSON, TSV | Get Dataset |
|
Danish-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | da-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Dutch (Netherlands)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
|
Call center | Audio | nl | 140 audio hours | WAV, FLAC | Get Dataset |
|
Dutch (Netherlands)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | nl | 210 audio hours | WAV, MP3 | Get Dataset |
|
Dutch-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | nl-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Dutch-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | nl-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
Dutch-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | nl-en | 2.8M pairs | JSON, TSV | Get Dataset |
|
Dutch-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | nl-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Dutch-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | nl-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Dutch-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | nl-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Dutch-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | nl-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Dutch-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | nl-pl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Dutch-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | nl-sl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Dutch-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | nl-es | 97.3k pairs | JSON, TSV | Get Dataset |
|
Dutch-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | nl-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
English (African)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | en | 79 audio hours | WAV, MP3 | Get Dataset |
|
English (Arabic)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, MP3 format.
|
General | Audio | en | 58 audio hours | MP3 | Get Dataset |
|
English (Australia)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | en | 86 audio hours | WAV, MP3 | Get Dataset |
|
English (India)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
Healthcare | Audio | en | 190 audio hours | WAV, MP3 | Get Dataset |
|
English (India)
audio dataset with dual speaker(s), mono/dual channel, 24+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | en | 274 audio hours | WAV, MP3 | Get Dataset |
|
English (India)
audio dataset with dual speaker(s), mono/dual channel, 24+ kHz sampling rate, WAV, MP3 format.
|
Call center | Audio | en | 155 audio hours | WAV, MP3 | Get Dataset |
|
English (Mixed)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
Far-field General | Audio | en | 4 audio hours | WAV, MP3 | Get Dataset |
|
English (Mixed)
audio dataset with dual speaker(s), mono/dual channel, 24+ kHz sampling rate, WAV, MP3 format.
|
Call center | Audio | en | 518 audio hours | WAV, MP3 | Get Dataset |
|
English (Mixed)
audio dataset with multi speaker(s), dual channel, 24+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | en | 25 audio hours | WAV, MP3 | Get Dataset |
|
English (Mixed)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, WAV, MP3 format.
|
Automotive | Audio | en | 4 audio hours | WAV, MP3 | Get Dataset |
|
English (Mixed)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | en | 1042 audio hours | WAV, MP3 | Get Dataset |
|
English (UK)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | en | 526 audio hours | WAV, MP3 | Get Dataset |
|
English (UK)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
Healthcare | Audio | en | 85 audio hours | WAV, MP3 | Get Dataset |
|
English (UK)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | en | 335 audio hours | WAV, MP3 | Get Dataset |
|
English (US)
audio dataset with dual speaker(s), mono channel, 16+ kHz sampling rate, MP3 format.
|
Finance | Audio | en | 115 audio hours | MP3 | Get Dataset |
|
English (US)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | en | 285 audio hours | WAV, MP3 | Get Dataset |
|
English (US)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
Call center | Audio | en | 137 audio hours | WAV, MP3 | Get Dataset |
|
English (US)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | en | 342 audio hours | WAV, MP3 | Get Dataset |
|
English-Albanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-sq | 25.6M pairs | JSON, TSV | Get Dataset |
|
English-Arabic
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-ar | 26.2M pairs | JSON, TSV | Get Dataset |
|
English-Armenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-hy | 9.5M pairs | JSON, TSV | Get Dataset |
|
English-Bosnian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-bs | 11.6M pairs | JSON, TSV | Get Dataset |
|
English-Bulgarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-bg | 25.2M pairs | JSON, TSV | Get Dataset |
|
English-Cantonese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-yue | 1.1M pairs | JSON, TSV | Get Dataset |
|
English-Catalan
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-ca | 15.2M pairs | JSON, TSV | Get Dataset |
|
English-Chinese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-zh | 33.3M pairs | JSON, TSV | Get Dataset |
|
English-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-hr | 18.1M pairs | JSON, TSV | Get Dataset |
|
English-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-cs | 22.1M pairs | JSON, TSV | Get Dataset |
|
English-Danish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-da | 55.9M pairs | JSON, TSV | Get Dataset |
|
English-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-nl | 34.9M pairs | JSON, TSV | Get Dataset |
|
English-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-et | 25.6M pairs | JSON, TSV | Get Dataset |
|
English-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-fi | 22.6M pairs | JSON, TSV | Get Dataset |
|
English-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-fr | 37.5M pairs | JSON, TSV | Get Dataset |
|
English-Georgian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-ka | 18.5M pairs | JSON, TSV | Get Dataset |
|
English-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-de | 35.7M pairs | JSON, TSV | Get Dataset |
|
English-Greek
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-el | 18.7M pairs | JSON, TSV | Get Dataset |
|
English-Hebrew
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-iw | 25.9M pairs | JSON, TSV | Get Dataset |
|
English-Hindi
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-hi | 12.9M pairs | JSON, TSV | Get Dataset |
|
English-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-hu | 25.7M pairs | JSON, TSV | Get Dataset |
|
English-Icelandic
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-is | 551 pairs | JSON, TSV | Get Dataset |
|
English-Indonesian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-id | 19.9M pairs | JSON, TSV | Get Dataset |
|
English-Irish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-ga | 1.1M pairs | JSON, TSV | Get Dataset |
|
English-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-it | 37.3M pairs | JSON, TSV | Get Dataset |
|
English-Japanese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-ja | 24.1M pairs | JSON, TSV | Get Dataset |
|
English-Korean
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-ko | 28.4M pairs | JSON, TSV | Get Dataset |
|
English-Kyrgyz
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-ky | 4.4M pairs | JSON, TSV | Get Dataset |
|
English-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-lv | 22.4M pairs | JSON, TSV | Get Dataset |
|
English-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-lt | 22.6M pairs | JSON, TSV | Get Dataset |
|
English-Malay
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-ms | 19.2M pairs | JSON, TSV | Get Dataset |
|
English-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-mt | 32.7k pairs | JSON, TSV | Get Dataset |
|
English-Norwegian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-no | 20.1M pairs | JSON, TSV | Get Dataset |
|
English-Persian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-fa | 20.2M pairs | JSON, TSV | Get Dataset |
|
English-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-pl | 37.5M pairs | JSON, TSV | Get Dataset |
|
English-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-pt | 23.6M pairs | JSON, TSV | Get Dataset |
|
English-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-ro | 26.6M pairs | JSON, TSV | Get Dataset |
|
English-Russian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-ru | 38.1M pairs | JSON, TSV | Get Dataset |
|
English-Serbian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-sr | 10.8M pairs | JSON, TSV | Get Dataset |
|
English-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-sk | 26.1M pairs | JSON, TSV | Get Dataset |
|
English-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-sl | 38.5M pairs | JSON, TSV | Get Dataset |
|
English-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-es | 36.4M pairs | JSON, TSV | Get Dataset |
|
English-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-sv | 39.5M pairs | JSON, TSV | Get Dataset |
|
English-Taiwanese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-tw | 32.2M pairs | JSON, TSV | Get Dataset |
|
English-Thai
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-th | 22.0M pairs | JSON, TSV | Get Dataset |
|
English-Traditional Chinese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-zhTW | 244.0k pairs | JSON, TSV | Get Dataset |
|
English-Turkish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-tr | 22.9M pairs | JSON, TSV | Get Dataset |
|
English-Ukrainian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-uk | 12.8M pairs | JSON, TSV | Get Dataset |
|
English-Vietnamese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | en-vi | 13.3M pairs | JSON, TSV | Get Dataset |
|
Estonian (Estonia)
audio dataset with dual speaker(s), mono channel, 24+ kHz sampling rate, WAV format.
|
Call center | Audio | et | 75 audio hours | WAV | Get Dataset |
|
Estonian (Estonia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | et | 140 audio hours | WAV, MP3 | Get Dataset |
|
Estonian-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | et-fi | 97.3k pairs | JSON, TSV | Get Dataset |
|
Estonian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | et-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Euskara (Basque)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | eu | 150 audio hours | WAV, MP3 | Get Dataset |
|
Filipino (Philippines)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV, MP3 format.
|
Call center | Audio | fil | 103 audio hours | WAV, MP3 | Get Dataset |
|
Filipino (Philippines)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | fil | 166 audio hours | WAV, MP3 | Get Dataset |
|
Finnish (Finland)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | fi | 350 audio hours | WAV, MP3 | Get Dataset |
|
Finnish-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fi-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Finnish-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fi-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
Finnish-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fi-nl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Finnish-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fi-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Finnish-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fi-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Finnish-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fi-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Finnish-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fi-pl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Finnish-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fi-sl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Finnish-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fi-es | 97.3k pairs | JSON, TSV | Get Dataset |
|
Finnish-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fi-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
French (Canada)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | fr | 106 audio hours | WAV, MP3 | Get Dataset |
|
French (France)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
|
Call center | Audio | fr | 74 audio hours | WAV, FLAC | Get Dataset |
|
French (France)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | fr | 446 audio hours | WAV, MP3 | Get Dataset |
|
French-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
French-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
French-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-nl | 97.3k pairs | JSON, TSV | Get Dataset |
|
French-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-en | 311.9k pairs | JSON, TSV | Get Dataset |
|
French-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
French-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-fi | 97.3k pairs | JSON, TSV | Get Dataset |
|
French-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-de | 1.9M pairs | JSON, TSV | Get Dataset |
|
French-Irish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-ga | 97.3k pairs | JSON, TSV | Get Dataset |
|
French-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-it | 29.0M pairs | JSON, TSV | Get Dataset |
|
French-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
French-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
French-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
French-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-pl | 97.3k pairs | JSON, TSV | Get Dataset |
|
French-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-pt | 1.7M pairs | JSON, TSV | Get Dataset |
|
French-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-sk | 97.3k pairs | JSON, TSV | Get Dataset |
|
French-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-sl | 97.3k pairs | JSON, TSV | Get Dataset |
|
French-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-es | 97.3k pairs | JSON, TSV | Get Dataset |
|
French-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fr-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Gallego (Galicia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | gl | 75 audio hours | WAV, MP3 | Get Dataset |
|
Georgian (Georgia)
audio dataset with dual speaker(s), mono channel, 24+ kHz sampling rate, WAV format.
|
Call center | Audio | ka | 9 audio hours | WAV | Get Dataset |
|
German (Austria)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | de | 182 audio hours | WAV, MP3 | Get Dataset |
|
German (Germany)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
|
Call center | Audio | de | 95 audio hours | WAV, FLAC | Get Dataset |
|
German (Germany)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | de | 550 audio hours | WAV, MP3 | Get Dataset |
|
German-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
German-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
German-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-nl | 97.3k pairs | JSON, TSV | Get Dataset |
|
German-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-en | 242.3k pairs | JSON, TSV | Get Dataset |
|
German-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
German-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-fi | 97.3k pairs | JSON, TSV | Get Dataset |
|
German-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-fr | 32.8M pairs | JSON, TSV | Get Dataset |
|
German-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-hu | 97.3k pairs | JSON, TSV | Get Dataset |
|
German-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-it | 1.6M pairs | JSON, TSV | Get Dataset |
|
German-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
German-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
German-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-mt | 461.6k pairs | JSON, TSV | Get Dataset |
|
German-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-pl | 97.3k pairs | JSON, TSV | Get Dataset |
|
German-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-pt | 1.7M pairs | JSON, TSV | Get Dataset |
|
German-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-sk | 97.3k pairs | JSON, TSV | Get Dataset |
|
German-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-sl | 97.3k pairs | JSON, TSV | Get Dataset |
|
German-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-es | 1.7M pairs | JSON, TSV | Get Dataset |
|
German-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | de-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek (Greece)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV format.
|
Call center | Audio | el | 45 audio hours | WAV | Get Dataset |
|
Greek (Greece)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | el | 165 audio hours | WAV, MP3 | Get Dataset |
|
Greek-Bulgarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-bg | 97.2k pairs | JSON, TSV | Get Dataset |
|
Greek-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Danish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-da | 97.2k pairs | JSON, TSV | Get Dataset |
|
Greek-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-nl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-fi | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-fr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-de | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-hu | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Irish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-ga | 97.2k pairs | JSON, TSV | Get Dataset |
|
Greek-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-it | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-mt | 391.4k pairs | JSON, TSV | Get Dataset |
|
Greek-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-pl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-pt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-ro | 97.2k pairs | JSON, TSV | Get Dataset |
|
Greek-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-sk | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-sl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-es | 97.3k pairs | JSON, TSV | Get Dataset |
|
Greek-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | el-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Hindi (India)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | hi | 600 audio hours | WAV, MP3 | Get Dataset |
|
Hindi-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hi-en | 246.0k pairs | JSON, TSV | Get Dataset |
|
Hungarian (Hungary)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV format.
|
Finance | Audio | hu | 50 audio hours | WAV | Get Dataset |
|
Hungarian (Hungary)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | hu | 111 audio hours | WAV, MP3 | Get Dataset |
|
Hungarian-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
Hungarian-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-nl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Hungarian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Hungarian-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-fi | 97.3k pairs | JSON, TSV | Get Dataset |
|
Hungarian-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-it | 97.3k pairs | JSON, TSV | Get Dataset |
|
Hungarian-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Hungarian-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Hungarian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-mt | 293.8k pairs | JSON, TSV | Get Dataset |
|
Hungarian-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-pl | 6.0M pairs | JSON, TSV | Get Dataset |
|
Hungarian-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-ro | 5.9M pairs | JSON, TSV | Get Dataset |
|
Hungarian-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-sk | 6.0M pairs | JSON, TSV | Get Dataset |
|
Hungarian-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-sl | 6.0M pairs | JSON, TSV | Get Dataset |
|
Hungarian-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-es | 97.3k pairs | JSON, TSV | Get Dataset |
|
Hungarian-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | hu-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Indonesian (Indonesia)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV, MP3 format.
|
Finance | Audio | id | 70 audio hours | WAV, MP3 | Get Dataset |
|
Indonesian (Indonesia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | id | 79 audio hours | WAV, MP3 | Get Dataset |
|
Indonesian-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | id-en | 2.5M pairs | JSON, TSV | Get Dataset |
|
Indonesian-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | id-fr | 1.5M pairs | JSON, TSV | Get Dataset |
|
Indonesian-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | id-de | 1.5M pairs | JSON, TSV | Get Dataset |
|
Indonesian-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | id-pt | 1.5M pairs | JSON, TSV | Get Dataset |
|
Indonesian-Russian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | id-ru | 1.5M pairs | JSON, TSV | Get Dataset |
|
Indonesian-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | id-es | 1.5M pairs | JSON, TSV | Get Dataset |
|
Irish-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Danish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-da | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-nl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-fi | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-de | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-hu | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-it | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Irish-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-pl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-pt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-sk | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-sl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-es | 97.3k pairs | JSON, TSV | Get Dataset |
|
Irish-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ga-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Italian (Italy)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
|
Legal | Audio | it | 96 audio hours | WAV, FLAC | Get Dataset |
|
Italian (Italy)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | it | 520 audio hours | WAV, MP3 | Get Dataset |
|
Italian-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Italian-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
Italian-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-nl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Italian-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-en | 734.7k pairs | JSON, TSV | Get Dataset |
|
Italian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Italian-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-fi | 97.3k pairs | JSON, TSV | Get Dataset |
|
Italian-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-de | 1.1M pairs | JSON, TSV | Get Dataset |
|
Italian-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Italian-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Italian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Italian-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-pl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Italian-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-pt | 1.5M pairs | JSON, TSV | Get Dataset |
|
Italian-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-sl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Italian-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-es | 97.3k pairs | JSON, TSV | Get Dataset |
|
Italian-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | it-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Japanese (Japan)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, FLAC, WAV format.
|
Call center | Audio | ja | 264 audio hours | FLAC, WAV | Get Dataset |
|
Japanese (Japan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | ja | 440 audio hours | WAV, MP3 | Get Dataset |
|
Japanese-Chinese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ja-zh | 4.0M pairs | JSON, TSV | Get Dataset |
|
Japanese-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ja-en | 2.6M pairs | JSON, TSV | Get Dataset |
|
Japanese-Korean
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ja-ko | 3.9M pairs | JSON, TSV | Get Dataset |
|
Javanese (Indonesia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | jv | 46 audio hours | WAV, MP3 | Get Dataset |
|
Khmer (Cambodia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | km | 61 audio hours | WAV, MP3 | Get Dataset |
|
Korean (South Korea)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
|
Gaming | Audio | ko | 20 audio hours | WAV, FLAC | Get Dataset |
|
Korean (South Korea)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | ko | 260 audio hours | WAV, MP3 | Get Dataset |
|
Korean (South Korea)
monolingual data for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ko | 2.0M pairs | JSON, TSV | Get Dataset |
|
Korean-Chinese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ko-zh | 536.8k pairs | JSON, TSV | Get Dataset |
|
Korean-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ko-en | 2.4M pairs | JSON, TSV | Get Dataset |
|
Korean-Japanese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ko-ja | 536.8k pairs | JSON, TSV | Get Dataset |
|
Kyrgyz (Kyrgyzstan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | ky | 41 audio hours | WAV, MP3 | Get Dataset |
|
Lao (Laos)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | lo | 11 audio hours | WAV, MP3 | Get Dataset |
|
Latvian (Latvia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | lv | 33 audio hours | WAV, MP3 | Get Dataset |
|
Latvian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | lv-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Latvian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | lv-mt | 391.4k pairs | JSON, TSV | Get Dataset |
|
Lithuanian (Lithuania)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | lt | 37 audio hours | WAV, MP3 | Get Dataset |
|
Lithuanian-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | lt-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Lithuanian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | lt-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Lithuanian-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | lt-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Lithuanian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | lt-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Lithuanian-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | lt-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Malay (Malaysia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | ms | 115 audio hours | WAV, MP3 | Get Dataset |
|
Maltese-Bulgarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-bg | 718.3k pairs | JSON, TSV | Get Dataset |
|
Maltese-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-hr | 833.4k pairs | JSON, TSV | Get Dataset |
|
Maltese-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-cs | 792.6k pairs | JSON, TSV | Get Dataset |
|
Maltese-Danish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-da | 815.5k pairs | JSON, TSV | Get Dataset |
|
Maltese-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-nl | 774.8k pairs | JSON, TSV | Get Dataset |
|
Maltese-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-en | 718.3k pairs | JSON, TSV | Get Dataset |
|
Maltese-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-et | 815.4k pairs | JSON, TSV | Get Dataset |
|
Maltese-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-fi | 815.3k pairs | JSON, TSV | Get Dataset |
|
Maltese-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-fr | 876.5k pairs | JSON, TSV | Get Dataset |
|
Maltese-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-de | 792.7k pairs | JSON, TSV | Get Dataset |
|
Maltese-Greek
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-el | 735.9k pairs | JSON, TSV | Get Dataset |
|
Maltese-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-hu | 833.1k pairs | JSON, TSV | Get Dataset |
|
Maltese-Irish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-ga | 774.9k pairs | JSON, TSV | Get Dataset |
|
Maltese-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-it | 815.7k pairs | JSON, TSV | Get Dataset |
|
Maltese-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-lv | 718.2k pairs | JSON, TSV | Get Dataset |
|
Maltese-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-lt | 815.3k pairs | JSON, TSV | Get Dataset |
|
Maltese-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-pl | 833.5k pairs | JSON, TSV | Get Dataset |
|
Maltese-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-pt | 815.5k pairs | JSON, TSV | Get Dataset |
|
Maltese-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-ro | 695.4k pairs | JSON, TSV | Get Dataset |
|
Maltese-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-sk | 815.5k pairs | JSON, TSV | Get Dataset |
|
Maltese-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-sl | 685.4k pairs | JSON, TSV | Get Dataset |
|
Maltese-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-es | 1.4M pairs | JSON, TSV | Get Dataset |
|
Maltese-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | mt-sv | 815.5k pairs | JSON, TSV | Get Dataset |
|
Norwegian (Norway)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
Healthcare | Audio | no | 78 audio hours | WAV, MP3 | Get Dataset |
|
Norwegian (Norway)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | no | 158 audio hours | WAV, MP3 | Get Dataset |
|
Pashto (Pakistan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | ps | 61 audio hours | WAV, MP3 | Get Dataset |
|
Persian-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | fa-en | 4.3M pairs | JSON, TSV | Get Dataset |
|
Polish (Poland)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | pl | 221 audio hours | WAV, MP3 | Get Dataset |
|
Polish-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pl-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Polish-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pl-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Polish-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pl-fr | 1.5M pairs | JSON, TSV | Get Dataset |
|
Polish-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pl-de | 1.5M pairs | JSON, TSV | Get Dataset |
|
Polish-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pl-it | 1.5M pairs | JSON, TSV | Get Dataset |
|
Polish-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pl-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Polish-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pl-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Polish-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pl-ro | 5.9M pairs | JSON, TSV | Get Dataset |
|
Polish-Russian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pl-ru | 1.5M pairs | JSON, TSV | Get Dataset |
|
Polish-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pl-sk | 5.9M pairs | JSON, TSV | Get Dataset |
|
Polish-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pl-sl | 6.0M pairs | JSON, TSV | Get Dataset |
|
Polish-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pl-es | 1.5M pairs | JSON, TSV | Get Dataset |
|
Polish-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pl-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese (Brazil)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
|
Call center | Audio | pt | 140 audio hours | WAV, FLAC | Get Dataset |
|
Portuguese (Portugal)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
|
Call center | Audio | pt | 140 audio hours | WAV, FLAC | Get Dataset |
|
Portuguese (Portugal)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | pt | 236 audio hours | WAV, MP3 | Get Dataset |
|
Portuguese-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-nl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-en | 2.3M pairs | JSON, TSV | Get Dataset |
|
Portuguese-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-fi | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-fr | 3.7M pairs | JSON, TSV | Get Dataset |
|
Portuguese-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-de | 1.3M pairs | JSON, TSV | Get Dataset |
|
Portuguese-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-hu | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-it | 2.0M pairs | JSON, TSV | Get Dataset |
|
Portuguese-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-mt | 294.1k pairs | JSON, TSV | Get Dataset |
|
Portuguese-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-pl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-ro | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-sk | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-sl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-es | 97.3k pairs | JSON, TSV | Get Dataset |
|
Portuguese-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | pt-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian (Romania)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | ro | 123 audio hours | WAV, MP3 | Get Dataset |
|
Romanian-Bulgarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-bg | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Danish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-da | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-nl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-fi | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-fr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-de | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-hu | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Irish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-ga | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-it | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-mt | 391.5k pairs | JSON, TSV | Get Dataset |
|
Romanian-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-pl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-sk | 6.0M pairs | JSON, TSV | Get Dataset |
|
Romanian-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-sl | 6.0M pairs | JSON, TSV | Get Dataset |
|
Romanian-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-es | 97.3k pairs | JSON, TSV | Get Dataset |
|
Romanian-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ro-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Russian (Russia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | ru | 301 audio hours | WAV, MP3 | Get Dataset |
|
Russian-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | ru-en | 869.1k pairs | JSON, TSV | Get Dataset |
|
Serbian (Serbia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | sr | 80 audio hours | WAV, MP3 | Get Dataset |
|
Sinhala (Sri Lanka)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | si | 50 audio hours | WAV, MP3 | Get Dataset |
|
Slovak (Slovakia)
audio dataset with dual speaker(s), mono channel, 24+ kHz sampling rate, WAV format.
|
Call center | Audio | sk | 72 audio hours | WAV | Get Dataset |
|
Slovak-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sk-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovak-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sk-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovak-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sk-nl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovak-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sk-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovak-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sk-fi | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovak-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sk-it | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovak-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sk-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovak-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sk-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovak-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sk-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Slovak-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sk-pl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovak-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sk-sl | 6.0M pairs | JSON, TSV | Get Dataset |
|
Slovak-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sk-es | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovak-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sk-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovenian-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sl-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovenian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sl-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovenian-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sl-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovenian-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sl-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Slovenian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sl-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Slovenian-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sl-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Spanish (Argentine)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | es | 27 audio hours | WAV, MP3 | Get Dataset |
|
Spanish (Latin American)
audio dataset with dual speaker(s), mono/dual channel, 24+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | es | 686 audio hours | WAV, MP3 | Get Dataset |
|
Spanish (Latin American)
audio dataset with dual speaker(s), mono/dual channel, 24+ kHz sampling rate, WAV, MP3 format.
|
Call center | Audio | es | 56 audio hours | WAV, MP3 | Get Dataset |
|
Spanish (Latin American)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | es | 565 audio hours | WAV, MP3 | Get Dataset |
|
Spanish (Mexican)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | es | 49 audio hours | WAV, MP3 | Get Dataset |
|
Spanish (Mixed)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | es | 695 audio hours | WAV, MP3 | Get Dataset |
|
Spanish (Spain)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | es | 1789 audio hours | WAV, MP3 | Get Dataset |
|
Spanish (Spain)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
Call center | Audio | es | 343 audio hours | WAV, MP3 | Get Dataset |
|
Spanish (Spain)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
Legal | Audio | es | 25 audio hours | WAV, MP3 | Get Dataset |
|
Spanish (Spain)
audio dataset with dual speaker(s), mono/dual channel, 24+ kHz sampling rate, WAV, MP3 format.
|
Sports | Audio | es | 106 audio hours | WAV, MP3 | Get Dataset |
|
Spanish (Spain)
audio dataset with mono speaker(s), dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | es | 43 audio hours | WAV, MP3 | Get Dataset |
|
Spanish (Spain)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, MP3 format.
|
Finance | Audio | es | 45 audio hours | MP3 | Get Dataset |
|
Spanish (Spain)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | es | 1356 audio hours | WAV, MP3 | Get Dataset |
|
Spanish-Basque
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-eu | 20M pairs | JSON, TSV | Get Dataset |
|
Spanish-Catalan
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-ca | 23.2M pairs | JSON, TSV | Get Dataset |
|
Spanish-Chinese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-zh | 10.1k pairs | JSON, TSV | Get Dataset |
|
Spanish-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Spanish-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-cs | 97.3k pairs | JSON, TSV | Get Dataset |
|
Spanish-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-nl | 4.6M pairs | JSON, TSV | Get Dataset |
|
Spanish-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-en | 20M pairs | JSON, TSV | Get Dataset |
|
Spanish-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Spanish-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-fr | 49.3M pairs | JSON, TSV | Get Dataset |
|
Spanish-Galician
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-gl | 20M pairs | JSON, TSV | Get Dataset |
|
Spanish-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-de | 26.3M pairs | JSON, TSV | Get Dataset |
|
Spanish-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-it | 29.6M pairs | JSON, TSV | Get Dataset |
|
Spanish-Korean
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-ko | 1.3M pairs | JSON, TSV | Get Dataset |
|
Spanish-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Spanish-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-lt | 97.3k pairs | JSON, TSV | Get Dataset |
|
Spanish-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Spanish-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-pl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Spanish-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-pt | 27.2M pairs | JSON, TSV | Get Dataset |
|
Spanish-Russian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-ru | 16.2M pairs | JSON, TSV | Get Dataset |
|
Spanish-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-sl | 97.3k pairs | JSON, TSV | Get Dataset |
|
Spanish-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-sv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Spanish-Valencian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | es-va | 12.4M pairs | JSON, TSV | Get Dataset |
|
Swedish (Sweden)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
Healthcare | Audio | sv | 80 audio hours | WAV, MP3 | Get Dataset |
|
Swedish (Sweden)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | sv | 183 audio hours | WAV, MP3 | Get Dataset |
|
Swedish-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sv-hr | 97.3k pairs | JSON, TSV | Get Dataset |
|
Swedish-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sv-et | 97.3k pairs | JSON, TSV | Get Dataset |
|
Swedish-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sv-lv | 97.3k pairs | JSON, TSV | Get Dataset |
|
Swedish-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | sv-mt | 294.2k pairs | JSON, TSV | Get Dataset |
|
Tajik (Tajikistan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | tg | 39 audio hours | WAV, MP3 | Get Dataset |
|
Thai (Thailand)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | th | 165 audio hours | WAV, MP3 | Get Dataset |
|
Thai-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | th-en | 2.1M pairs | JSON, TSV | Get Dataset |
|
Turkish (Turkey)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
|
Call center | Audio | tr | 135 audio hours | WAV, FLAC | Get Dataset |
|
Turkish (Turkey)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | tr | 195 audio hours | WAV, MP3 | Get Dataset |
|
Turkish-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | tr-en | 1.8M pairs | JSON, TSV | Get Dataset |
|
Turkmen (Turkmenistan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | tk | 33 audio hours | WAV, MP3 | Get Dataset |
|
Ukrainian (Ukraine)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | uk | 60 audio hours | WAV, MP3 | Get Dataset |
|
Urdu (Pakistan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | ur | 190 audio hours | WAV, MP3 | Get Dataset |
|
Uzbek (Uzbekistan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | uz | 68 audio hours | WAV, MP3 | Get Dataset |
|
Valenciano (Valencia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | ca-val | 125 audio hours | WAV, MP3 | Get Dataset |
|
Vietnamese (Vietnam)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
Call center | Audio | vi | 167 audio hours | WAV, MP3 | Get Dataset |
|
Vietnamese (Vietnam)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
|
General | Audio | vi | 167 audio hours | WAV, MP3 | Get Dataset |
|
Vietnamese-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | vi-en | 1.8M pairs | JSON, TSV | Get Dataset |
|
Vietnamese-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | vi-fr | 1.5M pairs | JSON, TSV | Get Dataset |
|
Vietnamese-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | vi-de | 1.5M pairs | JSON, TSV | Get Dataset |
|
Vietnamese-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | vi-pt | 1.5M pairs | JSON, TSV | Get Dataset |
|
Vietnamese-Russian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | vi-ru | 1.5M pairs | JSON, TSV | Get Dataset |
|
Vietnamese-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
|
General | Text | vi-es | 1.5M pairs | JSON, TSV | Get Dataset |
|
Sound of A/C machines
Average 1200 seconds per clip, 18 audio files, 44+ kHz.
|
General | Noise | - | 6 audio hours | WAV + JSON | Get Dataset |
|
Sound of water spraying gently, like a bidet in use
Average 30 seconds per clip, 720 audio files, 44+ kHz.
|
General | Noise | - | 6 audio hours | WAV + JSON | Get Dataset |
|
Loud, mechanical grinding noise from a coffee grinder
Average 120 seconds per clip, 120 audio files, 44+ kHz.
|
General | Noise | - | 4 audio hours | WAV + JSON | Get Dataset |
|
Sound of plugging or connecting cables on a table
Average 10 seconds per clip, 1080 audio files, 44+ kHz.
|
General | Noise | - | 3 audio hours | WAV + JSON | Get Dataset |
|
Classic fizzy pssst of opening a soda can
Average 10 seconds per clip, 1080 audio files, 44+ kHz.
|
General | Noise | - | 3 audio hours | WAV + JSON | Get Dataset |
|
Fast, rhythmic shuffling of playing cards
Average 30 seconds per clip, 720 audio files, 44+ kHz.
|
General | Noise | - | 6 audio hours | WAV + JSON | Get Dataset |
|
Indistinct clicking, likely from a pen or mouse
Average 10 seconds per clip, 720 audio files, 44+ kHz.
|
General | Noise | - | 2 audio hours | WAV + JSON | Get Dataset |
|
Paper being forcefully crumpled, like tossing a failed idea
Average 15 seconds per clip, 960 audio files, 44+ kHz.
|
General | Noise | - | 4 audio hours | WAV + JSON | Get Dataset |
|
Soft sloshing and whirring sounds from a running dishwasher
Average 1200 seconds per clip, 48 audio files, 44+ kHz.
|
General | Noise | - | 16 audio hours | WAV + JSON | Get Dataset |
|
Clicking latch followed by creaking and closing door sounds
Average 12 seconds per clip, 900 audio files, 44+ kHz.
|
General | Noise | - | 3 audio hours | WAV + JSON | Get Dataset |
|
Rubbing and towel or dryer sound from drying hands
Average 10 seconds per clip, 1080 audio files, 44+ kHz.
|
General | Noise | - | 3 audio hours | WAV + JSON | Get Dataset |
|
Steady buzzing sound of an electric razor in use
Average 30 seconds per clip, 600 audio files, 44+ kHz.
|
General | Noise | - | 5 audio hours | WAV + JSON | Get Dataset |
|
Mechanical hum and soft chime of an elevator operating
Average 15 seconds per clip, 480 audio files, 44+ kHz.
|
General | Noise | - | 2 audio hours | WAV + JSON | Get Dataset |
|
Subtle suction noise as a fridge door opens and closes
Average 10 seconds per clip, 360 audio files, 44+ kHz.
|
General | Noise | - | 1 audio hours | WAV + JSON | Get Dataset |
|
The sound of a refrigerator
Average 3600 seconds per clip, 15 audio files, 44+ kHz.
|
General | Noise | - | 15 audio hours | WAV + JSON | Get Dataset |
|
Sharp sizzling sound, like food cooking in hot oil
Average 30 seconds per clip, 1440 audio files, 44+ kHz.
|
General | Noise | - | 12 audio hours | WAV + JSON | Get Dataset |
|
Loud whirring of a garden blower moving air forcefully
Average 15 seconds per clip, 1200 audio files, 44+ kHz.
|
General | Noise | - | 5 audio hours | WAV + JSON | Get Dataset |
|
Clink of a glass cup being placed gently on a surface
Average 10 seconds per clip, 720 audio files, 44+ kHz.
|
General | Noise | - | 2 audio hours | WAV + JSON | Get Dataset |
|
Light clatter of eyeglasses being set down on a table
Average 60 seconds per clip, 300 audio files, 44+ kHz.
|
General | Noise | - | 5 audio hours | WAV + JSON | Get Dataset |
|
Strong, steady airflow and motor hum from a hair dryer
Average 120 seconds per clip, 360 audio files, 44+ kHz.
|
General | Noise | - | 12 audio hours | WAV + JSON | Get Dataset |
|
Metallic jingle or clink of keys being handled or dropped
Average 10 seconds per clip, 720 audio files, 44+ kHz.
|
General | Noise | - | 2 audio hours | WAV + JSON | Get Dataset |
|
Gurgling and hissing sounds from a Krups/Nespresso or similar espresso machine brewing coffee
Average 25 seconds per clip, 288 audio files, 44+ kHz.
|
General | Noise | - | 2 audio hours | WAV + JSON | Get Dataset |
|
Sharp click of a light switch being flipped on or off
Average 15 seconds per clip, 960 audio files, 44+ kHz.
|
General | Noise | - | 4 audio hours | WAV + JSON | Get Dataset |
|
Low humming with occasional beeps, typical of a microwave heating food
Average 60 seconds per clip, 600 audio files, 44+ kHz.
|
General | Noise | - | 10 audio hours | WAV + JSON | Get Dataset |
|
Soft clink as a cup is placed inside a microwave
Average 10 seconds per clip, 360 audio files, 44+ kHz.
|
General | Noise | - | 1 audio hours | WAV + JSON | Get Dataset |
|
Mixed or undefined background noises not easily categorized
Average 60 seconds per clip, 900 audio files, 44+ kHz.
|
General | Noise | - | 15 audio hours | WAV + JSON | Get Dataset |
|
Soft flipping and rustling as papers are being counted or sorted
Average 20 seconds per clip, 720 audio files, 44+ kHz.
|
General | Noise | - | 4 audio hours | WAV + JSON | Get Dataset |
|
Repetitive clicking sound of a pen being pressed
Average 10 seconds per clip, 1080 audio files, 44+ kHz.
|
General | Noise | - | 3 audio hours | WAV + JSON | Get Dataset |
|
Electrical click and hum from plugging or unplugging a device
Average 10 seconds per clip, 3600 audio files, 44+ kHz.
|
General | Noise | - | 10 audio hours | WAV + JSON | Get Dataset |
|
Steady stream of hot water being poured into a teacup
Average 15 seconds per clip, 480 audio files, 44+ kHz.
|
General | Noise | - | 2 audio hours | WAV + JSON | Get Dataset |
|
Continuous pouring sound, likely water into a container or sink
Average 60 seconds per clip, 660 audio files, 44+ kHz.
|
General | Noise | - | 11 audio hours | WAV + JSON | Get Dataset |
|
Water or swimming pool pump and filters
Average 1800 seconds per clip, 10 audio files, 44+ kHz.
|
General | Noise | - | 5 audio hours | WAV + JSON | Get Dataset |
|
Buzzing electric sound from a handheld shaver
Average 120 seconds per clip, 285 audio files, 44+ kHz.
|
General | Noise | - | 9.5 audio hours | WAV + JSON | Get Dataset |
|
Continuous stream and splash of water from a shower
Average 150 seconds per clip, 192 audio files, 44+ kHz.
|
General | Noise | - | 8 audio hours | WAV + JSON | Get Dataset |
|
Soft sipping sound from drinking a beverage
Average 10 seconds per clip, 720 audio files, 44+ kHz.
|
General | Noise | - | 2 audio hours | WAV + JSON | Get Dataset |
|
Muffled footsteps on a carpeted floor
Average 15 seconds per clip, 960 audio files, 44+ kHz.
|
General | Noise | - | 4 audio hours | WAV + JSON | Get Dataset |
|
Ambient sound of an empty house
Average 60 seconds per clip, 600 audio files, 44+ kHz.
|
General | Noise | - | 10 audio hours | WAV + JSON | Get Dataset |
|
snoring, sleeping sounds
Average 30 seconds per clip, 360 audio files, 44+ kHz.
|
General | Noise | - | 3 audio hours | WAV + JSON | Get Dataset |
|
Light clink and swirl of coffee being stirred
Average 15 seconds per clip, 960 audio files, 44+ kHz.
|
General | Noise | - | 4 audio hours | WAV + JSON | Get Dataset |
|
Popping or snapping sound of a tablet container being opened
Average 10 seconds per clip, 360 audio files, 44+ kHz.
|
General | Noise | - | 1 audio hours | WAV + JSON | Get Dataset |
|
Steady flow of water from a running tap
Average 120 seconds per clip, 330 audio files, 44+ kHz.
|
General | Noise | - | 11 audio hours | WAV + JSON | Get Dataset |
|
Toilet flushing sound
Average 30 seconds per clip, 540 audio files, 44+ kHz.
|
General | Noise | - | 4.5 audio hours | WAV + JSON | Get Dataset |
|
Fast or rhythmic tapping of keys while typing at a home office
Average 15 seconds per clip, 960 audio files, 44+ kHz.
|
General | Noise | - | 4 audio hours | WAV + JSON | Get Dataset |
|
Water splashing and rubbing sounds from washing hands
Average 10 seconds per clip, 1620 audio files, 44+ kHz.
|
General | Noise | - | 4.5 audio hours | WAV + JSON | Get Dataset |
|
Water being poured into a cup, increasing in pitch as it fills
Average 10 seconds per clip, 360 audio files, 44+ kHz.
|
General | Noise | - | 1 audio hours | WAV + JSON | Get Dataset |
|
Crackling pop and twist from opening or closing a water bottle
Average 20 seconds per clip, 180 audio files, 44+ kHz.
|
General | Noise | - | 1 audio hours | WAV + JSON | Get Dataset |
|
General sound of water running, could be a faucet, stream, or pipe
Average 120 seconds per clip, 480 audio files, 44+ kHz.
|
General | Noise | - | 16 audio hours | WAV + JSON | Get Dataset |
|
Back-and-forth zip sound of opening or closing a zipper
Average 8 seconds per clip, 900 audio files, 44+ kHz.
|
General | Noise | - | 2 audio hours | WAV + JSON | Get Dataset |
|
Sound of a coffee machine (Nespresso or similar) or cafetiere
Average 20 seconds per clip, 360 audio files, 44+ kHz.
|
General | Noise | - | 2 audio hours | WAV + JSON | Get Dataset |
|
Dog panting and licking
Average 10 seconds per clip, 360 audio files, 44+ kHz.
|
General | Noise | - | 1 audio hours | WAV + JSON | Get Dataset |
|
Oil sizzling while frying food
Average 40 seconds per clip, 180 audio files, 44+ kHz.
|
General | Noise | - | 2 audio hours | WAV + JSON | Get Dataset |
|
Glasses being placed on a shelf
Average 30 seconds per clip, 240 audio files, 44+ kHz.
|
General | Noise | - | 2 audio hours | WAV + JSON | Get Dataset |