Arabic Spontaneous Dialogue
About the Dataset
308 hours
This audio dataset contains 308 hours of Arabic Speech Data in Banking, Telecommunication, Insurance and Retail domains, recorded by speakers from Egypt, Jordan, UAE, and Yemen.
There are 102 hours of Arabic (Egypt) human-to-human audio with the following domain distribution per dataset:
- 20.85 hours of Banking
- 30.65 hours of Insurance
- 24.32 hours of Retail
- 26.88 hours of Telecommunication
There are 90 hours of Arabic (Jordan) human-to-human audio with the following domain distribution per dataset:
- 24.15 hours of Banking
- 23.88 hours of Insurance
- 21.32 hours of Retail
- 21.5 hours of Telecommunication
There are 25 hours of Arabic from the United Arab Emirates human-to-human audio with the following domain distribution per dataset:
- 13.05 hours of Retail
- 12.68 hours of Telecommunication
There are 91 hours of Arabic (Yemen) human-to-human audio with the following domain distribution per dataset:
- 33.55 hours of banking human-to-human audio recordings
- 31 hours of Insurance human-to-human audio recordings
- 11.25 hours of retail human-to-human audio recordings
- 15.53 hours of telecommunication human-to-human audio recordings
Defined.ai creates scenarios for our crowd members to follow, which they study beforehand. They then record a conversation, one speaker playing the agent, the other speaker “playing out” the scenario with spontaneous content. The recording is done via telephony and is saved in 8kHz 16 bit per channel. That content is then transcribed.
The dataset is covered by Defined.ai's standard license agreement. The license agreement is perpetual and allows for the commercialization of all models built on the data.
Other characteristics:
- Audio format: WAV
- Recording environment: noisy, silent
- Bits per sample: 16
- Communication band: broadband
- Sample rate: 8kHz
Metadata Distribution
Samples
Short samples:
- 5-minute sample of Arabic Egypt. Transcription for the sample is also available.
- 5-minute sample of Arabic Jordan. Transcription for the sample is also available
- 5-minute audio sample of Arabic (UAE). Transcription for the sample is also available
- 5-minute sample of Arabic Yemen. Transcription for the sample is also available