Arabic Scripted Monologue
About the Dataset
716 hours
This audio dataset contains 716 hours of Arabic Speech Data recorded by native Arabic speakers from the Egypt, Jordan, United Arab Emirates, and Yemen.
There are 219 hours of English (Egypt) scripted monologue with the following domain distribution per dataset:
- 25.4 hours of Banking
- 55.98 hours of Generic domain
- 45.65 hours of Insurance
- 46.33 hours of Retail
- 45.88 hours of Telecommunication
There are 216 hours of Arabic (Jordan) scripted monologue with the following domain distribution per dataset:
- 22.2 hours of Automotive
- 56.1 hours of Banking
- 44.65 hours of Insurance
- 46.52 hours of Retail
- 46.6 hours of Telecommunication
There are 72 hours of Arabic (United Arab Emirates) scripted monologue with the following domain distribution per dataset:
- 10.08 hours of Automotive
- 15.83 hours of Banking
- 15.95 hours of Insurance
- 15.55 hours of Retail
- 15.22 hours of Telecommunication
There are 209 hours of Arabic (Yemen) scripted monologue with the following domain distribution per dataset:
- 20.87 hours of Automotive
- 46.98 hours of Banking
- 47.12 hours of Insurance
- 47.13 hours of Retail
- 47.08 hours of Telecommunication
The speakers are presented with a prompt (script) and asked to read it out loud and record. Our clients will receive an audio recording, the prompt and information about the speaker. The audio is recorded on-device, typically in 16Khz 16 bit. We also provide information on which device each record was recorded.
The dataset is covered by Defined.ai's standard license agreement. The license agreement is perpetual and allows for the commercialization of all models built on the data.
Samples
- 5-minutes sample Arabic Egypt. Transcription for the sample is also available
- 5-minutes sample Arabic Jordan. Transcription for the sample is also available
- 5-minutes sample Arabic UAE. Transcription for the sample is also available
- 5-minutes sample Arabic Yemen. Transcription for the sample is also available