Skip to the main content.
Try ECO LLM Try ECO Translate

AI Datasets Marketplace

High-quality, ethically sourced Off-the-shelf datasets for training, fine-tuning and evaluating AI models.

🛡️
Human-verified

High accuracy data verified by experts

🌍
Diverse & Global

Coverage in 50+ languages & dialects

✏️
Ready to Deploy

Structured, clean & model-ready formats

📄
Secure & Compliant

GDPR-compliant and ethically sourced

Datasets Domain Type Language Size Format Details
🎧
Albanian (Albania)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV, FLAC format.
Finance Audio sq 38 audio hours WAV, FLAC Get Dataset
🎧
Albanian (Albania)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio sq 94 audio hours WAV, MP3 Get Dataset
🎧
Arabic (Bahrain)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, FLAC format.
Call center Audio ar 873 audio hours FLAC Get Dataset
🎧
Arabic (Egypt)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio ar 277 audio hours WAV, MP3 Get Dataset
🎧
Arabic (MSA)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio ar 1242 audio hours WAV, MP3 Get Dataset
🎧
Arabic (Oman)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, FLAC, MP3 format.
Call center Audio ar 125 audio hours WAV, FLAC, MP3 Get Dataset
🎧
Arabic (Saudi)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, FLAC, MP3 format.
Call center Audio ar 265 audio hours WAV, FLAC, MP3 Get Dataset
🎧
Arabic (UAE)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, FLAC format.
Call center Audio ar 500 audio hours FLAC Get Dataset
📄
Arabic-Chinese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ar-zh 1.5M pairs JSON, TSV Get Dataset
📄
Arabic-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ar-en 2.7M pairs JSON, TSV Get Dataset
📄
Arabic-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ar-de 1.5M pairs JSON, TSV Get Dataset
📄
Arabic-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ar-it 1.5M pairs JSON, TSV Get Dataset
📄
Arabic-Japanese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ar-ja 1.5M pairs JSON, TSV Get Dataset
📄
Arabic-Korean
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ar-ko 1.5M pairs JSON, TSV Get Dataset
📄
Arabic-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ar-pt 1.5M pairs JSON, TSV Get Dataset
📄
Arabic-Russian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ar-ru 1.5M pairs JSON, TSV Get Dataset
📄
Arabic-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ar-es 1.5M pairs JSON, TSV Get Dataset
🎧
Armenian (Armenia)
audio dataset with dual speaker(s), mono channel, 24+ kHz sampling rate, WAV format.
Healthcare Audio hy 65 audio hours WAV Get Dataset
🎧
Azerbaijani (Azerbaijan)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV, MP3 format.
Call center Audio az 29 audio hours WAV, MP3 Get Dataset
🎧
Azerbaijani (Azerbaijan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio az 98 audio hours WAV, MP3 Get Dataset
🎧
Balochi (Balochistan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio bal 158 audio hours WAV, MP3 Get Dataset
🎧
Bosnian (Bosnia)
audio dataset with dual speaker(s), mono channel, 24+ kHz sampling rate, WAV format.
Travel Audio bs 60 audio hours WAV Get Dataset
🎧
Bulgarian (Bulgaria)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio bg 130 audio hours WAV, MP3 Get Dataset
📄
Bulgarian-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-hr 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-cs 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Danish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-da 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-nl 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-et 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-fi 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-fr 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-de 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-hu 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Irish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-ga 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-it 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-lv 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-lt 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-mt 294.2k pairs JSON, TSV Get Dataset
📄
Bulgarian-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-pl 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-pt 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-sk 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-sl 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-es 97.3k pairs JSON, TSV Get Dataset
📄
Bulgarian-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text bg-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Burmese (Myanmar)
audio dataset with dual speaker(s), mono channel, 16+ kHz sampling rate, WAV format.
Call center Audio my 81 audio hours WAV Get Dataset
🎧
Burmese (Myanmar)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio my 122 audio hours WAV, MP3 Get Dataset
🎧
Cantonese (China)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, FLAC, WAV format.
Finance Audio yue 161 audio hours FLAC, WAV Get Dataset
🎧
Cantonese (China)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, FLAC, WAV format.
Call center Audio yue 165 audio hours FLAC, WAV Get Dataset
🎧
Cantonese (China)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio yue 116 audio hours WAV, MP3 Get Dataset
🎧
Catalan (Catalunya)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio ca 200 audio hours WAV, MP3 Get Dataset
🎧
Chinese (China)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio zh 216 audio hours WAV, MP3 Get Dataset
📄
Chinese-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text zh-en 1.6M pairs JSON, TSV Get Dataset
📄
Chinese-Korean
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text zh-ko 3.8M pairs JSON, TSV Get Dataset
🎧
Croatian (Croatia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio hr 80 audio hours WAV, MP3 Get Dataset
📄
Croatian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hr-et 97.3k pairs JSON, TSV Get Dataset
📄
Croatian-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hr-hu 6.0M pairs JSON, TSV Get Dataset
📄
Croatian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hr-mt 294.2k pairs JSON, TSV Get Dataset
📄
Croatian-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hr-pl 5.9M pairs JSON, TSV Get Dataset
📄
Croatian-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hr-ro 5.9M pairs JSON, TSV Get Dataset
📄
Croatian-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hr-sk 5.9M pairs JSON, TSV Get Dataset
📄
Croatian-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hr-sl 5.9M pairs JSON, TSV Get Dataset
🎧
Czech (Czech Republic)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV format.
Call center Audio cs 78 audio hours WAV Get Dataset
🎧
Czech (Czech Republic)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio cs 103 audio hours WAV, MP3 Get Dataset
📄
Czech-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text cs-hr 6.0M pairs JSON, TSV Get Dataset
📄
Czech-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text cs-en 5.9M pairs JSON, TSV Get Dataset
📄
Czech-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text cs-et 97.3k pairs JSON, TSV Get Dataset
📄
Czech-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text cs-hu 5.9M pairs JSON, TSV Get Dataset
📄
Czech-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text cs-lv 97.3k pairs JSON, TSV Get Dataset
📄
Czech-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text cs-lt 97.3k pairs JSON, TSV Get Dataset
📄
Czech-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text cs-mt 294.2k pairs JSON, TSV Get Dataset
📄
Czech-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text cs-pl 5.9M pairs JSON, TSV Get Dataset
📄
Czech-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text cs-ro 5.9M pairs JSON, TSV Get Dataset
📄
Czech-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text cs-sk 5.9M pairs JSON, TSV Get Dataset
📄
Czech-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text cs-sl 6.0M pairs JSON, TSV Get Dataset
📄
Czech-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text cs-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Danish (Denmark)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV format.
Call center Audio da 110 audio hours WAV Get Dataset
🎧
Danish (Denmark)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio da 150 audio hours WAV, MP3 Get Dataset
📄
Danish-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-hr 97.3k pairs JSON, TSV Get Dataset
📄
Danish-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-cs 97.3k pairs JSON, TSV Get Dataset
📄
Danish-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-nl 97.3k pairs JSON, TSV Get Dataset
📄
Danish-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-en 101.8k pairs JSON, TSV Get Dataset
📄
Danish-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-et 97.3k pairs JSON, TSV Get Dataset
📄
Danish-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-fi 97.3k pairs JSON, TSV Get Dataset
📄
Danish-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-fr 97.3k pairs JSON, TSV Get Dataset
📄
Danish-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-de 97.3k pairs JSON, TSV Get Dataset
📄
Danish-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-hu 97.3k pairs JSON, TSV Get Dataset
📄
Danish-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-it 97.3k pairs JSON, TSV Get Dataset
📄
Danish-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-lv 97.3k pairs JSON, TSV Get Dataset
📄
Danish-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-lt 97.3k pairs JSON, TSV Get Dataset
📄
Danish-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-mt 294.2k pairs JSON, TSV Get Dataset
📄
Danish-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-pl 97.3k pairs JSON, TSV Get Dataset
📄
Danish-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-pt 97.3k pairs JSON, TSV Get Dataset
📄
Danish-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-sk 97.3k pairs JSON, TSV Get Dataset
📄
Danish-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-sl 97.3k pairs JSON, TSV Get Dataset
📄
Danish-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-es 97.3k pairs JSON, TSV Get Dataset
📄
Danish-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text da-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Dutch (Netherlands)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
Call center Audio nl 140 audio hours WAV, FLAC Get Dataset
🎧
Dutch (Netherlands)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio nl 210 audio hours WAV, MP3 Get Dataset
📄
Dutch-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text nl-hr 97.3k pairs JSON, TSV Get Dataset
📄
Dutch-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text nl-cs 97.3k pairs JSON, TSV Get Dataset
📄
Dutch-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text nl-en 2.8M pairs JSON, TSV Get Dataset
📄
Dutch-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text nl-et 97.3k pairs JSON, TSV Get Dataset
📄
Dutch-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text nl-lv 97.3k pairs JSON, TSV Get Dataset
📄
Dutch-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text nl-lt 97.3k pairs JSON, TSV Get Dataset
📄
Dutch-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text nl-mt 294.2k pairs JSON, TSV Get Dataset
📄
Dutch-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text nl-pl 97.3k pairs JSON, TSV Get Dataset
📄
Dutch-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text nl-sl 97.3k pairs JSON, TSV Get Dataset
📄
Dutch-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text nl-es 97.3k pairs JSON, TSV Get Dataset
📄
Dutch-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text nl-sv 97.3k pairs JSON, TSV Get Dataset
🎧
English (African)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio en 79 audio hours WAV, MP3 Get Dataset
🎧
English (Arabic)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, MP3 format.
General Audio en 58 audio hours MP3 Get Dataset
🎧
English (Australia)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio en 86 audio hours WAV, MP3 Get Dataset
🎧
English (India)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
Healthcare Audio en 190 audio hours WAV, MP3 Get Dataset
🎧
English (India)
audio dataset with dual speaker(s), mono/dual channel, 24+ kHz sampling rate, WAV, MP3 format.
General Audio en 274 audio hours WAV, MP3 Get Dataset
🎧
English (India)
audio dataset with dual speaker(s), mono/dual channel, 24+ kHz sampling rate, WAV, MP3 format.
Call center Audio en 155 audio hours WAV, MP3 Get Dataset
🎧
English (Mixed)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
Far-field General Audio en 4 audio hours WAV, MP3 Get Dataset
🎧
English (Mixed)
audio dataset with dual speaker(s), mono/dual channel, 24+ kHz sampling rate, WAV, MP3 format.
Call center Audio en 518 audio hours WAV, MP3 Get Dataset
🎧
English (Mixed)
audio dataset with multi speaker(s), dual channel, 24+ kHz sampling rate, WAV, MP3 format.
General Audio en 25 audio hours WAV, MP3 Get Dataset
🎧
English (Mixed)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, WAV, MP3 format.
Automotive Audio en 4 audio hours WAV, MP3 Get Dataset
🎧
English (Mixed)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio en 1042 audio hours WAV, MP3 Get Dataset
🎧
English (UK)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio en 526 audio hours WAV, MP3 Get Dataset
🎧
English (UK)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
Healthcare Audio en 85 audio hours WAV, MP3 Get Dataset
🎧
English (UK)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio en 335 audio hours WAV, MP3 Get Dataset
🎧
English (US)
audio dataset with dual speaker(s), mono channel, 16+ kHz sampling rate, MP3 format.
Finance Audio en 115 audio hours MP3 Get Dataset
🎧
English (US)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio en 285 audio hours WAV, MP3 Get Dataset
🎧
English (US)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
Call center Audio en 137 audio hours WAV, MP3 Get Dataset
🎧
English (US)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio en 342 audio hours WAV, MP3 Get Dataset
📄
English-Albanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-sq 25.6M pairs JSON, TSV Get Dataset
📄
English-Arabic
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-ar 26.2M pairs JSON, TSV Get Dataset
📄
English-Armenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-hy 9.5M pairs JSON, TSV Get Dataset
📄
English-Bosnian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-bs 11.6M pairs JSON, TSV Get Dataset
📄
English-Bulgarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-bg 25.2M pairs JSON, TSV Get Dataset
📄
English-Cantonese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-yue 1.1M pairs JSON, TSV Get Dataset
📄
English-Catalan
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-ca 15.2M pairs JSON, TSV Get Dataset
📄
English-Chinese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-zh 33.3M pairs JSON, TSV Get Dataset
📄
English-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-hr 18.1M pairs JSON, TSV Get Dataset
📄
English-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-cs 22.1M pairs JSON, TSV Get Dataset
📄
English-Danish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-da 55.9M pairs JSON, TSV Get Dataset
📄
English-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-nl 34.9M pairs JSON, TSV Get Dataset
📄
English-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-et 25.6M pairs JSON, TSV Get Dataset
📄
English-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-fi 22.6M pairs JSON, TSV Get Dataset
📄
English-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-fr 37.5M pairs JSON, TSV Get Dataset
📄
English-Georgian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-ka 18.5M pairs JSON, TSV Get Dataset
📄
English-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-de 35.7M pairs JSON, TSV Get Dataset
📄
English-Greek
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-el 18.7M pairs JSON, TSV Get Dataset
📄
English-Hebrew
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-iw 25.9M pairs JSON, TSV Get Dataset
📄
English-Hindi
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-hi 12.9M pairs JSON, TSV Get Dataset
📄
English-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-hu 25.7M pairs JSON, TSV Get Dataset
📄
English-Icelandic
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-is 551 pairs JSON, TSV Get Dataset
📄
English-Indonesian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-id 19.9M pairs JSON, TSV Get Dataset
📄
English-Irish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-ga 1.1M pairs JSON, TSV Get Dataset
📄
English-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-it 37.3M pairs JSON, TSV Get Dataset
📄
English-Japanese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-ja 24.1M pairs JSON, TSV Get Dataset
📄
English-Korean
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-ko 28.4M pairs JSON, TSV Get Dataset
📄
English-Kyrgyz
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-ky 4.4M pairs JSON, TSV Get Dataset
📄
English-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-lv 22.4M pairs JSON, TSV Get Dataset
📄
English-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-lt 22.6M pairs JSON, TSV Get Dataset
📄
English-Malay
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-ms 19.2M pairs JSON, TSV Get Dataset
📄
English-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-mt 32.7k pairs JSON, TSV Get Dataset
📄
English-Norwegian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-no 20.1M pairs JSON, TSV Get Dataset
📄
English-Persian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-fa 20.2M pairs JSON, TSV Get Dataset
📄
English-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-pl 37.5M pairs JSON, TSV Get Dataset
📄
English-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-pt 23.6M pairs JSON, TSV Get Dataset
📄
English-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-ro 26.6M pairs JSON, TSV Get Dataset
📄
English-Russian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-ru 38.1M pairs JSON, TSV Get Dataset
📄
English-Serbian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-sr 10.8M pairs JSON, TSV Get Dataset
📄
English-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-sk 26.1M pairs JSON, TSV Get Dataset
📄
English-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-sl 38.5M pairs JSON, TSV Get Dataset
📄
English-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-es 36.4M pairs JSON, TSV Get Dataset
📄
English-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-sv 39.5M pairs JSON, TSV Get Dataset
📄
English-Taiwanese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-tw 32.2M pairs JSON, TSV Get Dataset
📄
English-Thai
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-th 22.0M pairs JSON, TSV Get Dataset
📄
English-Traditional Chinese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-zhTW 244.0k pairs JSON, TSV Get Dataset
📄
English-Turkish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-tr 22.9M pairs JSON, TSV Get Dataset
📄
English-Ukrainian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-uk 12.8M pairs JSON, TSV Get Dataset
📄
English-Vietnamese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text en-vi 13.3M pairs JSON, TSV Get Dataset
🎧
Estonian (Estonia)
audio dataset with dual speaker(s), mono channel, 24+ kHz sampling rate, WAV format.
Call center Audio et 75 audio hours WAV Get Dataset
🎧
Estonian (Estonia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio et 140 audio hours WAV, MP3 Get Dataset
📄
Estonian-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text et-fi 97.3k pairs JSON, TSV Get Dataset
📄
Estonian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text et-mt 294.2k pairs JSON, TSV Get Dataset
🎧
Euskara (Basque)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio eu 150 audio hours WAV, MP3 Get Dataset
🎧
Filipino (Philippines)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV, MP3 format.
Call center Audio fil 103 audio hours WAV, MP3 Get Dataset
🎧
Filipino (Philippines)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio fil 166 audio hours WAV, MP3 Get Dataset
🎧
Finnish (Finland)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio fi 350 audio hours WAV, MP3 Get Dataset
📄
Finnish-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fi-hr 97.3k pairs JSON, TSV Get Dataset
📄
Finnish-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fi-cs 97.3k pairs JSON, TSV Get Dataset
📄
Finnish-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fi-nl 97.3k pairs JSON, TSV Get Dataset
📄
Finnish-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fi-lv 97.3k pairs JSON, TSV Get Dataset
📄
Finnish-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fi-lt 97.3k pairs JSON, TSV Get Dataset
📄
Finnish-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fi-mt 294.2k pairs JSON, TSV Get Dataset
📄
Finnish-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fi-pl 97.3k pairs JSON, TSV Get Dataset
📄
Finnish-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fi-sl 97.3k pairs JSON, TSV Get Dataset
📄
Finnish-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fi-es 97.3k pairs JSON, TSV Get Dataset
📄
Finnish-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fi-sv 97.3k pairs JSON, TSV Get Dataset
🎧
French (Canada)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio fr 106 audio hours WAV, MP3 Get Dataset
🎧
French (France)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
Call center Audio fr 74 audio hours WAV, FLAC Get Dataset
🎧
French (France)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio fr 446 audio hours WAV, MP3 Get Dataset
📄
French-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-hr 97.3k pairs JSON, TSV Get Dataset
📄
French-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-cs 97.3k pairs JSON, TSV Get Dataset
📄
French-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-nl 97.3k pairs JSON, TSV Get Dataset
📄
French-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-en 311.9k pairs JSON, TSV Get Dataset
📄
French-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-et 97.3k pairs JSON, TSV Get Dataset
📄
French-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-fi 97.3k pairs JSON, TSV Get Dataset
📄
French-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-de 1.9M pairs JSON, TSV Get Dataset
📄
French-Irish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-ga 97.3k pairs JSON, TSV Get Dataset
📄
French-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-it 29.0M pairs JSON, TSV Get Dataset
📄
French-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-lv 97.3k pairs JSON, TSV Get Dataset
📄
French-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-lt 97.3k pairs JSON, TSV Get Dataset
📄
French-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-mt 294.2k pairs JSON, TSV Get Dataset
📄
French-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-pl 97.3k pairs JSON, TSV Get Dataset
📄
French-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-pt 1.7M pairs JSON, TSV Get Dataset
📄
French-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-sk 97.3k pairs JSON, TSV Get Dataset
📄
French-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-sl 97.3k pairs JSON, TSV Get Dataset
📄
French-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-es 97.3k pairs JSON, TSV Get Dataset
📄
French-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fr-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Gallego (Galicia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio gl 75 audio hours WAV, MP3 Get Dataset
🎧
Georgian (Georgia)
audio dataset with dual speaker(s), mono channel, 24+ kHz sampling rate, WAV format.
Call center Audio ka 9 audio hours WAV Get Dataset
🎧
German (Austria)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio de 182 audio hours WAV, MP3 Get Dataset
🎧
German (Germany)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
Call center Audio de 95 audio hours WAV, FLAC Get Dataset
🎧
German (Germany)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio de 550 audio hours WAV, MP3 Get Dataset
📄
German-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-hr 97.3k pairs JSON, TSV Get Dataset
📄
German-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-cs 97.3k pairs JSON, TSV Get Dataset
📄
German-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-nl 97.3k pairs JSON, TSV Get Dataset
📄
German-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-en 242.3k pairs JSON, TSV Get Dataset
📄
German-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-et 97.3k pairs JSON, TSV Get Dataset
📄
German-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-fi 97.3k pairs JSON, TSV Get Dataset
📄
German-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-fr 32.8M pairs JSON, TSV Get Dataset
📄
German-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-hu 97.3k pairs JSON, TSV Get Dataset
📄
German-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-it 1.6M pairs JSON, TSV Get Dataset
📄
German-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-lv 97.3k pairs JSON, TSV Get Dataset
📄
German-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-lt 97.3k pairs JSON, TSV Get Dataset
📄
German-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-mt 461.6k pairs JSON, TSV Get Dataset
📄
German-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-pl 97.3k pairs JSON, TSV Get Dataset
📄
German-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-pt 1.7M pairs JSON, TSV Get Dataset
📄
German-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-sk 97.3k pairs JSON, TSV Get Dataset
📄
German-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-sl 97.3k pairs JSON, TSV Get Dataset
📄
German-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-es 1.7M pairs JSON, TSV Get Dataset
📄
German-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text de-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Greek (Greece)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV format.
Call center Audio el 45 audio hours WAV Get Dataset
🎧
Greek (Greece)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio el 165 audio hours WAV, MP3 Get Dataset
📄
Greek-Bulgarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-bg 97.2k pairs JSON, TSV Get Dataset
📄
Greek-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-hr 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-cs 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Danish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-da 97.2k pairs JSON, TSV Get Dataset
📄
Greek-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-nl 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-et 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-fi 97.3k pairs JSON, TSV Get Dataset
📄
Greek-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-fr 97.3k pairs JSON, TSV Get Dataset
📄
Greek-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-de 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-hu 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Irish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-ga 97.2k pairs JSON, TSV Get Dataset
📄
Greek-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-it 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-lv 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-lt 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-mt 391.4k pairs JSON, TSV Get Dataset
📄
Greek-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-pl 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-pt 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-ro 97.2k pairs JSON, TSV Get Dataset
📄
Greek-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-sk 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-sl 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-es 97.3k pairs JSON, TSV Get Dataset
📄
Greek-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text el-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Hindi (India)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio hi 600 audio hours WAV, MP3 Get Dataset
📄
Hindi-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hi-en 246.0k pairs JSON, TSV Get Dataset
🎧
Hungarian (Hungary)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV format.
Finance Audio hu 50 audio hours WAV Get Dataset
🎧
Hungarian (Hungary)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio hu 111 audio hours WAV, MP3 Get Dataset
📄
Hungarian-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-cs 97.3k pairs JSON, TSV Get Dataset
📄
Hungarian-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-nl 97.3k pairs JSON, TSV Get Dataset
📄
Hungarian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-et 97.3k pairs JSON, TSV Get Dataset
📄
Hungarian-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-fi 97.3k pairs JSON, TSV Get Dataset
📄
Hungarian-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-it 97.3k pairs JSON, TSV Get Dataset
📄
Hungarian-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-lv 97.3k pairs JSON, TSV Get Dataset
📄
Hungarian-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-lt 97.3k pairs JSON, TSV Get Dataset
📄
Hungarian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-mt 293.8k pairs JSON, TSV Get Dataset
📄
Hungarian-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-pl 6.0M pairs JSON, TSV Get Dataset
📄
Hungarian-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-ro 5.9M pairs JSON, TSV Get Dataset
📄
Hungarian-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-sk 6.0M pairs JSON, TSV Get Dataset
📄
Hungarian-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-sl 6.0M pairs JSON, TSV Get Dataset
📄
Hungarian-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-es 97.3k pairs JSON, TSV Get Dataset
📄
Hungarian-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text hu-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Indonesian (Indonesia)
audio dataset with dual speaker(s), dual channel, 24+ kHz sampling rate, WAV, MP3 format.
Finance Audio id 70 audio hours WAV, MP3 Get Dataset
🎧
Indonesian (Indonesia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio id 79 audio hours WAV, MP3 Get Dataset
📄
Indonesian-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text id-en 2.5M pairs JSON, TSV Get Dataset
📄
Indonesian-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text id-fr 1.5M pairs JSON, TSV Get Dataset
📄
Indonesian-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text id-de 1.5M pairs JSON, TSV Get Dataset
📄
Indonesian-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text id-pt 1.5M pairs JSON, TSV Get Dataset
📄
Indonesian-Russian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text id-ru 1.5M pairs JSON, TSV Get Dataset
📄
Indonesian-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text id-es 1.5M pairs JSON, TSV Get Dataset
📄
Irish-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-hr 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-cs 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Danish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-da 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-nl 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-et 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-fi 97.3k pairs JSON, TSV Get Dataset
📄
Irish-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-de 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-hu 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-it 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-lv 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-lt 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-mt 294.2k pairs JSON, TSV Get Dataset
📄
Irish-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-pl 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-pt 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-sk 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-sl 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-es 97.3k pairs JSON, TSV Get Dataset
📄
Irish-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ga-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Italian (Italy)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
Legal Audio it 96 audio hours WAV, FLAC Get Dataset
🎧
Italian (Italy)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio it 520 audio hours WAV, MP3 Get Dataset
📄
Italian-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-hr 97.3k pairs JSON, TSV Get Dataset
📄
Italian-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-cs 97.3k pairs JSON, TSV Get Dataset
📄
Italian-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-nl 97.3k pairs JSON, TSV Get Dataset
📄
Italian-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-en 734.7k pairs JSON, TSV Get Dataset
📄
Italian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-et 97.3k pairs JSON, TSV Get Dataset
📄
Italian-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-fi 97.3k pairs JSON, TSV Get Dataset
📄
Italian-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-de 1.1M pairs JSON, TSV Get Dataset
📄
Italian-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-lv 97.3k pairs JSON, TSV Get Dataset
📄
Italian-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-lt 97.3k pairs JSON, TSV Get Dataset
📄
Italian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-mt 294.2k pairs JSON, TSV Get Dataset
📄
Italian-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-pl 97.3k pairs JSON, TSV Get Dataset
📄
Italian-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-pt 1.5M pairs JSON, TSV Get Dataset
📄
Italian-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-sl 97.3k pairs JSON, TSV Get Dataset
📄
Italian-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-es 97.3k pairs JSON, TSV Get Dataset
📄
Italian-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text it-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Japanese (Japan)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, FLAC, WAV format.
Call center Audio ja 264 audio hours FLAC, WAV Get Dataset
🎧
Japanese (Japan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio ja 440 audio hours WAV, MP3 Get Dataset
📄
Japanese-Chinese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ja-zh 4.0M pairs JSON, TSV Get Dataset
📄
Japanese-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ja-en 2.6M pairs JSON, TSV Get Dataset
📄
Japanese-Korean
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ja-ko 3.9M pairs JSON, TSV Get Dataset
🎧
Javanese (Indonesia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio jv 46 audio hours WAV, MP3 Get Dataset
🎧
Khmer (Cambodia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio km 61 audio hours WAV, MP3 Get Dataset
🎧
Korean (South Korea)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
Gaming Audio ko 20 audio hours WAV, FLAC Get Dataset
🎧
Korean (South Korea)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio ko 260 audio hours WAV, MP3 Get Dataset
📄
Korean (South Korea)
monolingual data for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ko 2.0M pairs JSON, TSV Get Dataset
📄
Korean-Chinese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ko-zh 536.8k pairs JSON, TSV Get Dataset
📄
Korean-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ko-en 2.4M pairs JSON, TSV Get Dataset
📄
Korean-Japanese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ko-ja 536.8k pairs JSON, TSV Get Dataset
🎧
Kyrgyz (Kyrgyzstan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio ky 41 audio hours WAV, MP3 Get Dataset
🎧
Lao (Laos)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio lo 11 audio hours WAV, MP3 Get Dataset
🎧
Latvian (Latvia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio lv 33 audio hours WAV, MP3 Get Dataset
📄
Latvian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text lv-et 97.3k pairs JSON, TSV Get Dataset
📄
Latvian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text lv-mt 391.4k pairs JSON, TSV Get Dataset
🎧
Lithuanian (Lithuania)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio lt 37 audio hours WAV, MP3 Get Dataset
📄
Lithuanian-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text lt-hr 97.3k pairs JSON, TSV Get Dataset
📄
Lithuanian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text lt-et 97.3k pairs JSON, TSV Get Dataset
📄
Lithuanian-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text lt-lv 97.3k pairs JSON, TSV Get Dataset
📄
Lithuanian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text lt-mt 294.2k pairs JSON, TSV Get Dataset
📄
Lithuanian-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text lt-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Malay (Malaysia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio ms 115 audio hours WAV, MP3 Get Dataset
📄
Maltese-Bulgarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-bg 718.3k pairs JSON, TSV Get Dataset
📄
Maltese-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-hr 833.4k pairs JSON, TSV Get Dataset
📄
Maltese-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-cs 792.6k pairs JSON, TSV Get Dataset
📄
Maltese-Danish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-da 815.5k pairs JSON, TSV Get Dataset
📄
Maltese-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-nl 774.8k pairs JSON, TSV Get Dataset
📄
Maltese-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-en 718.3k pairs JSON, TSV Get Dataset
📄
Maltese-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-et 815.4k pairs JSON, TSV Get Dataset
📄
Maltese-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-fi 815.3k pairs JSON, TSV Get Dataset
📄
Maltese-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-fr 876.5k pairs JSON, TSV Get Dataset
📄
Maltese-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-de 792.7k pairs JSON, TSV Get Dataset
📄
Maltese-Greek
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-el 735.9k pairs JSON, TSV Get Dataset
📄
Maltese-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-hu 833.1k pairs JSON, TSV Get Dataset
📄
Maltese-Irish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-ga 774.9k pairs JSON, TSV Get Dataset
📄
Maltese-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-it 815.7k pairs JSON, TSV Get Dataset
📄
Maltese-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-lv 718.2k pairs JSON, TSV Get Dataset
📄
Maltese-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-lt 815.3k pairs JSON, TSV Get Dataset
📄
Maltese-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-pl 833.5k pairs JSON, TSV Get Dataset
📄
Maltese-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-pt 815.5k pairs JSON, TSV Get Dataset
📄
Maltese-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-ro 695.4k pairs JSON, TSV Get Dataset
📄
Maltese-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-sk 815.5k pairs JSON, TSV Get Dataset
📄
Maltese-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-sl 685.4k pairs JSON, TSV Get Dataset
📄
Maltese-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-es 1.4M pairs JSON, TSV Get Dataset
📄
Maltese-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text mt-sv 815.5k pairs JSON, TSV Get Dataset
🎧
Norwegian (Norway)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
Healthcare Audio no 78 audio hours WAV, MP3 Get Dataset
🎧
Norwegian (Norway)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio no 158 audio hours WAV, MP3 Get Dataset
🎧
Pashto (Pakistan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio ps 61 audio hours WAV, MP3 Get Dataset
📄
Persian-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text fa-en 4.3M pairs JSON, TSV Get Dataset
🎧
Polish (Poland)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio pl 221 audio hours WAV, MP3 Get Dataset
📄
Polish-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pl-hr 97.3k pairs JSON, TSV Get Dataset
📄
Polish-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pl-et 97.3k pairs JSON, TSV Get Dataset
📄
Polish-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pl-fr 1.5M pairs JSON, TSV Get Dataset
📄
Polish-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pl-de 1.5M pairs JSON, TSV Get Dataset
📄
Polish-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pl-it 1.5M pairs JSON, TSV Get Dataset
📄
Polish-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pl-lv 97.3k pairs JSON, TSV Get Dataset
📄
Polish-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pl-mt 294.2k pairs JSON, TSV Get Dataset
📄
Polish-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pl-ro 5.9M pairs JSON, TSV Get Dataset
📄
Polish-Russian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pl-ru 1.5M pairs JSON, TSV Get Dataset
📄
Polish-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pl-sk 5.9M pairs JSON, TSV Get Dataset
📄
Polish-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pl-sl 6.0M pairs JSON, TSV Get Dataset
📄
Polish-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pl-es 1.5M pairs JSON, TSV Get Dataset
📄
Polish-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pl-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Portuguese (Brazil)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
Call center Audio pt 140 audio hours WAV, FLAC Get Dataset
🎧
Portuguese (Portugal)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
Call center Audio pt 140 audio hours WAV, FLAC Get Dataset
🎧
Portuguese (Portugal)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio pt 236 audio hours WAV, MP3 Get Dataset
📄
Portuguese-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-hr 97.3k pairs JSON, TSV Get Dataset
📄
Portuguese-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-cs 97.3k pairs JSON, TSV Get Dataset
📄
Portuguese-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-nl 97.3k pairs JSON, TSV Get Dataset
📄
Portuguese-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-en 2.3M pairs JSON, TSV Get Dataset
📄
Portuguese-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-et 97.3k pairs JSON, TSV Get Dataset
📄
Portuguese-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-fi 97.3k pairs JSON, TSV Get Dataset
📄
Portuguese-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-fr 3.7M pairs JSON, TSV Get Dataset
📄
Portuguese-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-de 1.3M pairs JSON, TSV Get Dataset
📄
Portuguese-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-hu 97.3k pairs JSON, TSV Get Dataset
📄
Portuguese-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-it 2.0M pairs JSON, TSV Get Dataset
📄
Portuguese-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-lv 97.3k pairs JSON, TSV Get Dataset
📄
Portuguese-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-lt 97.3k pairs JSON, TSV Get Dataset
📄
Portuguese-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-mt 294.1k pairs JSON, TSV Get Dataset
📄
Portuguese-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-pl 97.3k pairs JSON, TSV Get Dataset
📄
Portuguese-Romanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-ro 97.3k pairs JSON, TSV Get Dataset
📄
Portuguese-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-sk 97.3k pairs JSON, TSV Get Dataset
📄
Portuguese-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-sl 97.3k pairs JSON, TSV Get Dataset
📄
Portuguese-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-es 97.3k pairs JSON, TSV Get Dataset
📄
Portuguese-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text pt-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Romanian (Romania)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio ro 123 audio hours WAV, MP3 Get Dataset
📄
Romanian-Bulgarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-bg 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-hr 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-cs 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Danish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-da 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-nl 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-et 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-fi 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-fr 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-de 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Hungarian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-hu 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Irish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-ga 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-it 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-lv 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-lt 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-mt 391.5k pairs JSON, TSV Get Dataset
📄
Romanian-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-pl 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Slovak
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-sk 6.0M pairs JSON, TSV Get Dataset
📄
Romanian-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-sl 6.0M pairs JSON, TSV Get Dataset
📄
Romanian-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-es 97.3k pairs JSON, TSV Get Dataset
📄
Romanian-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ro-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Russian (Russia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio ru 301 audio hours WAV, MP3 Get Dataset
📄
Russian-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text ru-en 869.1k pairs JSON, TSV Get Dataset
🎧
Serbian (Serbia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio sr 80 audio hours WAV, MP3 Get Dataset
🎧
Sinhala (Sri Lanka)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio si 50 audio hours WAV, MP3 Get Dataset
🎧
Slovak (Slovakia)
audio dataset with dual speaker(s), mono channel, 24+ kHz sampling rate, WAV format.
Call center Audio sk 72 audio hours WAV Get Dataset
📄
Slovak-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sk-hr 97.3k pairs JSON, TSV Get Dataset
📄
Slovak-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sk-cs 97.3k pairs JSON, TSV Get Dataset
📄
Slovak-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sk-nl 97.3k pairs JSON, TSV Get Dataset
📄
Slovak-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sk-et 97.3k pairs JSON, TSV Get Dataset
📄
Slovak-Finnish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sk-fi 97.3k pairs JSON, TSV Get Dataset
📄
Slovak-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sk-it 97.3k pairs JSON, TSV Get Dataset
📄
Slovak-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sk-lv 97.3k pairs JSON, TSV Get Dataset
📄
Slovak-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sk-lt 97.3k pairs JSON, TSV Get Dataset
📄
Slovak-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sk-mt 294.2k pairs JSON, TSV Get Dataset
📄
Slovak-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sk-pl 97.3k pairs JSON, TSV Get Dataset
📄
Slovak-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sk-sl 6.0M pairs JSON, TSV Get Dataset
📄
Slovak-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sk-es 97.3k pairs JSON, TSV Get Dataset
📄
Slovak-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sk-sv 97.3k pairs JSON, TSV Get Dataset
📄
Slovenian-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sl-hr 97.3k pairs JSON, TSV Get Dataset
📄
Slovenian-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sl-et 97.3k pairs JSON, TSV Get Dataset
📄
Slovenian-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sl-lv 97.3k pairs JSON, TSV Get Dataset
📄
Slovenian-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sl-lt 97.3k pairs JSON, TSV Get Dataset
📄
Slovenian-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sl-mt 294.2k pairs JSON, TSV Get Dataset
📄
Slovenian-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sl-sv 97.3k pairs JSON, TSV Get Dataset
🎧
Spanish (Argentine)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio es 27 audio hours WAV, MP3 Get Dataset
🎧
Spanish (Latin American)
audio dataset with dual speaker(s), mono/dual channel, 24+ kHz sampling rate, WAV, MP3 format.
General Audio es 686 audio hours WAV, MP3 Get Dataset
🎧
Spanish (Latin American)
audio dataset with dual speaker(s), mono/dual channel, 24+ kHz sampling rate, WAV, MP3 format.
Call center Audio es 56 audio hours WAV, MP3 Get Dataset
🎧
Spanish (Latin American)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio es 565 audio hours WAV, MP3 Get Dataset
🎧
Spanish (Mexican)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio es 49 audio hours WAV, MP3 Get Dataset
🎧
Spanish (Mixed)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio es 695 audio hours WAV, MP3 Get Dataset
🎧
Spanish (Spain)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio es 1789 audio hours WAV, MP3 Get Dataset
🎧
Spanish (Spain)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
Call center Audio es 343 audio hours WAV, MP3 Get Dataset
🎧
Spanish (Spain)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
Legal Audio es 25 audio hours WAV, MP3 Get Dataset
🎧
Spanish (Spain)
audio dataset with dual speaker(s), mono/dual channel, 24+ kHz sampling rate, WAV, MP3 format.
Sports Audio es 106 audio hours WAV, MP3 Get Dataset
🎧
Spanish (Spain)
audio dataset with mono speaker(s), dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio es 43 audio hours WAV, MP3 Get Dataset
🎧
Spanish (Spain)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, MP3 format.
Finance Audio es 45 audio hours MP3 Get Dataset
🎧
Spanish (Spain)
audio dataset with single speaker(s), mono channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio es 1356 audio hours WAV, MP3 Get Dataset
📄
Spanish-Basque
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-eu 20M pairs JSON, TSV Get Dataset
📄
Spanish-Catalan
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-ca 23.2M pairs JSON, TSV Get Dataset
📄
Spanish-Chinese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-zh 10.1k pairs JSON, TSV Get Dataset
📄
Spanish-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-hr 97.3k pairs JSON, TSV Get Dataset
📄
Spanish-Czech
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-cs 97.3k pairs JSON, TSV Get Dataset
📄
Spanish-Dutch
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-nl 4.6M pairs JSON, TSV Get Dataset
📄
Spanish-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-en 20M pairs JSON, TSV Get Dataset
📄
Spanish-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-et 97.3k pairs JSON, TSV Get Dataset
📄
Spanish-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-fr 49.3M pairs JSON, TSV Get Dataset
📄
Spanish-Galician
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-gl 20M pairs JSON, TSV Get Dataset
📄
Spanish-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-de 26.3M pairs JSON, TSV Get Dataset
📄
Spanish-Italian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-it 29.6M pairs JSON, TSV Get Dataset
📄
Spanish-Korean
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-ko 1.3M pairs JSON, TSV Get Dataset
📄
Spanish-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-lv 97.3k pairs JSON, TSV Get Dataset
📄
Spanish-Lithuanian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-lt 97.3k pairs JSON, TSV Get Dataset
📄
Spanish-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-mt 294.2k pairs JSON, TSV Get Dataset
📄
Spanish-Polish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-pl 97.3k pairs JSON, TSV Get Dataset
📄
Spanish-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-pt 27.2M pairs JSON, TSV Get Dataset
📄
Spanish-Russian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-ru 16.2M pairs JSON, TSV Get Dataset
📄
Spanish-Slovenian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-sl 97.3k pairs JSON, TSV Get Dataset
📄
Spanish-Swedish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-sv 97.3k pairs JSON, TSV Get Dataset
📄
Spanish-Valencian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text es-va 12.4M pairs JSON, TSV Get Dataset
🎧
Swedish (Sweden)
audio dataset with dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
Healthcare Audio sv 80 audio hours WAV, MP3 Get Dataset
🎧
Swedish (Sweden)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio sv 183 audio hours WAV, MP3 Get Dataset
📄
Swedish-Croatian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sv-hr 97.3k pairs JSON, TSV Get Dataset
📄
Swedish-Estonian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sv-et 97.3k pairs JSON, TSV Get Dataset
📄
Swedish-Latvian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sv-lv 97.3k pairs JSON, TSV Get Dataset
📄
Swedish-Maltese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text sv-mt 294.2k pairs JSON, TSV Get Dataset
🎧
Tajik (Tajikistan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio tg 39 audio hours WAV, MP3 Get Dataset
🎧
Thai (Thailand)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio th 165 audio hours WAV, MP3 Get Dataset
📄
Thai-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text th-en 2.1M pairs JSON, TSV Get Dataset
🎧
Turkish (Turkey)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, FLAC format.
Call center Audio tr 135 audio hours WAV, FLAC Get Dataset
🎧
Turkish (Turkey)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio tr 195 audio hours WAV, MP3 Get Dataset
📄
Turkish-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text tr-en 1.8M pairs JSON, TSV Get Dataset
🎧
Turkmen (Turkmenistan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio tk 33 audio hours WAV, MP3 Get Dataset
🎧
Ukrainian (Ukraine)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio uk 60 audio hours WAV, MP3 Get Dataset
🎧
Urdu (Pakistan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio ur 190 audio hours WAV, MP3 Get Dataset
🎧
Uzbek (Uzbekistan)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio uz 68 audio hours WAV, MP3 Get Dataset
🎧
Valenciano (Valencia)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio ca-val 125 audio hours WAV, MP3 Get Dataset
🎧
Vietnamese (Vietnam)
audio dataset with dual speaker(s), dual channel, 16+ kHz sampling rate, WAV, MP3 format.
Call center Audio vi 167 audio hours WAV, MP3 Get Dataset
🎧
Vietnamese (Vietnam)
audio dataset with single/dual speaker(s), mono/dual channel, 16+ kHz sampling rate, WAV, MP3 format.
General Audio vi 167 audio hours WAV, MP3 Get Dataset
📄
Vietnamese-English
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text vi-en 1.8M pairs JSON, TSV Get Dataset
📄
Vietnamese-French
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text vi-fr 1.5M pairs JSON, TSV Get Dataset
📄
Vietnamese-German
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text vi-de 1.5M pairs JSON, TSV Get Dataset
📄
Vietnamese-Portuguese
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text vi-pt 1.5M pairs JSON, TSV Get Dataset
📄
Vietnamese-Russian
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text vi-ru 1.5M pairs JSON, TSV Get Dataset
📄
Vietnamese-Spanish
parallel corpus for neural machine translation, cross-lingual NLP, and bilingual text mining.
General Text vi-es 1.5M pairs JSON, TSV Get Dataset
🔊
Sound of A/C machines
Average 1200 seconds per clip, 18 audio files, 44+ kHz.
General Noise - 6 audio hours WAV + JSON Get Dataset
🔊
Sound of water spraying gently, like a bidet in use
Average 30 seconds per clip, 720 audio files, 44+ kHz.
General Noise - 6 audio hours WAV + JSON Get Dataset
🔊
Loud, mechanical grinding noise from a coffee grinder
Average 120 seconds per clip, 120 audio files, 44+ kHz.
General Noise - 4 audio hours WAV + JSON Get Dataset
🔊
Sound of plugging or connecting cables on a table
Average 10 seconds per clip, 1080 audio files, 44+ kHz.
General Noise - 3 audio hours WAV + JSON Get Dataset
🔊
Classic fizzy pssst of opening a soda can
Average 10 seconds per clip, 1080 audio files, 44+ kHz.
General Noise - 3 audio hours WAV + JSON Get Dataset
🔊
Fast, rhythmic shuffling of playing cards
Average 30 seconds per clip, 720 audio files, 44+ kHz.
General Noise - 6 audio hours WAV + JSON Get Dataset
🔊
Indistinct clicking, likely from a pen or mouse
Average 10 seconds per clip, 720 audio files, 44+ kHz.
General Noise - 2 audio hours WAV + JSON Get Dataset
🔊
Paper being forcefully crumpled, like tossing a failed idea
Average 15 seconds per clip, 960 audio files, 44+ kHz.
General Noise - 4 audio hours WAV + JSON Get Dataset
🔊
Soft sloshing and whirring sounds from a running dishwasher
Average 1200 seconds per clip, 48 audio files, 44+ kHz.
General Noise - 16 audio hours WAV + JSON Get Dataset
🔊
Clicking latch followed by creaking and closing door sounds
Average 12 seconds per clip, 900 audio files, 44+ kHz.
General Noise - 3 audio hours WAV + JSON Get Dataset
🔊
Rubbing and towel or dryer sound from drying hands
Average 10 seconds per clip, 1080 audio files, 44+ kHz.
General Noise - 3 audio hours WAV + JSON Get Dataset
🔊
Steady buzzing sound of an electric razor in use
Average 30 seconds per clip, 600 audio files, 44+ kHz.
General Noise - 5 audio hours WAV + JSON Get Dataset
🔊
Mechanical hum and soft chime of an elevator operating
Average 15 seconds per clip, 480 audio files, 44+ kHz.
General Noise - 2 audio hours WAV + JSON Get Dataset
🔊
Subtle suction noise as a fridge door opens and closes
Average 10 seconds per clip, 360 audio files, 44+ kHz.
General Noise - 1 audio hours WAV + JSON Get Dataset
🔊
The sound of a refrigerator
Average 3600 seconds per clip, 15 audio files, 44+ kHz.
General Noise - 15 audio hours WAV + JSON Get Dataset
🔊
Sharp sizzling sound, like food cooking in hot oil
Average 30 seconds per clip, 1440 audio files, 44+ kHz.
General Noise - 12 audio hours WAV + JSON Get Dataset
🔊
Loud whirring of a garden blower moving air forcefully
Average 15 seconds per clip, 1200 audio files, 44+ kHz.
General Noise - 5 audio hours WAV + JSON Get Dataset
🔊
Clink of a glass cup being placed gently on a surface
Average 10 seconds per clip, 720 audio files, 44+ kHz.
General Noise - 2 audio hours WAV + JSON Get Dataset
🔊
Light clatter of eyeglasses being set down on a table
Average 60 seconds per clip, 300 audio files, 44+ kHz.
General Noise - 5 audio hours WAV + JSON Get Dataset
🔊
Strong, steady airflow and motor hum from a hair dryer
Average 120 seconds per clip, 360 audio files, 44+ kHz.
General Noise - 12 audio hours WAV + JSON Get Dataset
🔊
Metallic jingle or clink of keys being handled or dropped
Average 10 seconds per clip, 720 audio files, 44+ kHz.
General Noise - 2 audio hours WAV + JSON Get Dataset
🔊
Gurgling and hissing sounds from a Krups/Nespresso or similar espresso machine brewing coffee
Average 25 seconds per clip, 288 audio files, 44+ kHz.
General Noise - 2 audio hours WAV + JSON Get Dataset
🔊
Sharp click of a light switch being flipped on or off
Average 15 seconds per clip, 960 audio files, 44+ kHz.
General Noise - 4 audio hours WAV + JSON Get Dataset
🔊
Low humming with occasional beeps, typical of a microwave heating food
Average 60 seconds per clip, 600 audio files, 44+ kHz.
General Noise - 10 audio hours WAV + JSON Get Dataset
🔊
Soft clink as a cup is placed inside a microwave
Average 10 seconds per clip, 360 audio files, 44+ kHz.
General Noise - 1 audio hours WAV + JSON Get Dataset
🔊
Mixed or undefined background noises not easily categorized
Average 60 seconds per clip, 900 audio files, 44+ kHz.
General Noise - 15 audio hours WAV + JSON Get Dataset
🔊
Soft flipping and rustling as papers are being counted or sorted
Average 20 seconds per clip, 720 audio files, 44+ kHz.
General Noise - 4 audio hours WAV + JSON Get Dataset
🔊
Repetitive clicking sound of a pen being pressed
Average 10 seconds per clip, 1080 audio files, 44+ kHz.
General Noise - 3 audio hours WAV + JSON Get Dataset
🔊
Electrical click and hum from plugging or unplugging a device
Average 10 seconds per clip, 3600 audio files, 44+ kHz.
General Noise - 10 audio hours WAV + JSON Get Dataset
🔊
Steady stream of hot water being poured into a teacup
Average 15 seconds per clip, 480 audio files, 44+ kHz.
General Noise - 2 audio hours WAV + JSON Get Dataset
🔊
Continuous pouring sound, likely water into a container or sink
Average 60 seconds per clip, 660 audio files, 44+ kHz.
General Noise - 11 audio hours WAV + JSON Get Dataset
🔊
Water or swimming pool pump and filters
Average 1800 seconds per clip, 10 audio files, 44+ kHz.
General Noise - 5 audio hours WAV + JSON Get Dataset
🔊
Buzzing electric sound from a handheld shaver
Average 120 seconds per clip, 285 audio files, 44+ kHz.
General Noise - 9.5 audio hours WAV + JSON Get Dataset
🔊
Continuous stream and splash of water from a shower
Average 150 seconds per clip, 192 audio files, 44+ kHz.
General Noise - 8 audio hours WAV + JSON Get Dataset
🔊
Soft sipping sound from drinking a beverage
Average 10 seconds per clip, 720 audio files, 44+ kHz.
General Noise - 2 audio hours WAV + JSON Get Dataset
🔊
Muffled footsteps on a carpeted floor
Average 15 seconds per clip, 960 audio files, 44+ kHz.
General Noise - 4 audio hours WAV + JSON Get Dataset
🔊
Ambient sound of an empty house
Average 60 seconds per clip, 600 audio files, 44+ kHz.
General Noise - 10 audio hours WAV + JSON Get Dataset
🔊
snoring, sleeping sounds
Average 30 seconds per clip, 360 audio files, 44+ kHz.
General Noise - 3 audio hours WAV + JSON Get Dataset
🔊
Light clink and swirl of coffee being stirred
Average 15 seconds per clip, 960 audio files, 44+ kHz.
General Noise - 4 audio hours WAV + JSON Get Dataset
🔊
Popping or snapping sound of a tablet container being opened
Average 10 seconds per clip, 360 audio files, 44+ kHz.
General Noise - 1 audio hours WAV + JSON Get Dataset
🔊
Steady flow of water from a running tap
Average 120 seconds per clip, 330 audio files, 44+ kHz.
General Noise - 11 audio hours WAV + JSON Get Dataset
🔊
Toilet flushing sound
Average 30 seconds per clip, 540 audio files, 44+ kHz.
General Noise - 4.5 audio hours WAV + JSON Get Dataset
🔊
Fast or rhythmic tapping of keys while typing at a home office
Average 15 seconds per clip, 960 audio files, 44+ kHz.
General Noise - 4 audio hours WAV + JSON Get Dataset
🔊
Water splashing and rubbing sounds from washing hands
Average 10 seconds per clip, 1620 audio files, 44+ kHz.
General Noise - 4.5 audio hours WAV + JSON Get Dataset
🔊
Water being poured into a cup, increasing in pitch as it fills
Average 10 seconds per clip, 360 audio files, 44+ kHz.
General Noise - 1 audio hours WAV + JSON Get Dataset
🔊
Crackling pop and twist from opening or closing a water bottle
Average 20 seconds per clip, 180 audio files, 44+ kHz.
General Noise - 1 audio hours WAV + JSON Get Dataset
🔊
General sound of water running, could be a faucet, stream, or pipe
Average 120 seconds per clip, 480 audio files, 44+ kHz.
General Noise - 16 audio hours WAV + JSON Get Dataset
🔊
Back-and-forth zip sound of opening or closing a zipper
Average 8 seconds per clip, 900 audio files, 44+ kHz.
General Noise - 2 audio hours WAV + JSON Get Dataset
🔊
Sound of a coffee machine (Nespresso or similar) or cafetiere
Average 20 seconds per clip, 360 audio files, 44+ kHz.
General Noise - 2 audio hours WAV + JSON Get Dataset
🔊
Dog panting and licking
Average 10 seconds per clip, 360 audio files, 44+ kHz.
General Noise - 1 audio hours WAV + JSON Get Dataset
🔊
Oil sizzling while frying food
Average 40 seconds per clip, 180 audio files, 44+ kHz.
General Noise - 2 audio hours WAV + JSON Get Dataset
🔊
Glasses being placed on a shelf
Average 30 seconds per clip, 240 audio files, 44+ kHz.
General Noise - 2 audio hours WAV + JSON Get Dataset
Return to top