English Black American Spontaneous Dialogue

English
Banking
Telecommunication
Retail
Insurance

About the Dataset

201 hours

This audio dataset contains 201 hours of English Black American Speech Data in Insurance, Banking, Retail, and Telecommunication domains, recorded by native English speakers of the US.

Domain distribution per dataset:

  • 49 hours of Banking
  • 56 hours of Insurance
  • 51 hours of Retail
  • 45 hours of Telecommunication

Defined.ai creates scenarios for our crowd members to follow, which they study beforehand. They then record a conversation, one speaker playing the agent, the other speaker “playing out” the scenario with spontaneous content. The recording is done via telephony and is saved in 8kHz 16 bit per channel. That content is then transcribed.

The dataset is covered by Defined.ai's standard license agreement. The license agreement is perpetual and allows for the commercialization of all models built on the data.

Other characteristics:

  • Audio format: WAV
  • Recording environment: noisy, silent
  • Bits per sample: 16
  • Communication band: broadband
  • Sample rate: 8kHz

Samples

Short Sample

Check this 5-minute audio sample from the telecommunication domain here. Transcription for the sample is also available

Download Full Sample

All fields are required

By downloading, installing, accessing, and/or using this data sample, you consent to receive communications from Defined.ai and affirm your acceptance of our Privacy Policy, Terms of Use, and Data License Agreement. Consent can be revoked at your discretion.