## About the Dataset
### 2006 hours

There are 1800 hours of Dutch (Netherlands) human-to-human audio with the following domain distribution per dataset:
- 482.17 hours of Banking
- 420.43 hours of Insurance
- 415.4 hours of Retail
- 475.53 hours of Telecommunication
- 7 hours of Generic data without specific domain

There are 206 hours of Dutch (Belgium) human-to-human audio with the following domain distribution per dataset:
- 58.6 hours of Banking
- 53.65 hours of Insurance
- 48.12 hours of Retail
- 46.37 hours of Telecommunication

Defined.ai creates scenarios for our crowd members to follow, which they study beforehand. They then record a conversation, one speaker playing the agent, the other speaker “playing out” the scenario with spontaneous content. The recording is done via telephony and is saved in 8khz 16 bit per channel. That content is then transcribed.

The dataset is covered by [Defined.ai's standard license agreement](https://www.defined.ai/dataset/data-licence-agreement). The license agreement is perpetual and allows for the commercialization of all models built on the data.

Other characteristics:

- Audio format: WAV
- Recording environment: noisy, silent
- Bits per sample: 16
- Communication band: broadband
- Sample rate: 8Hz

## Metadata Distribution

### Dutch Netherlands
![Spontaneous_Dutch_NL_Gender.png](https://prdstrapimediastorage.blob.core.windows.net/prdstrapimediastorage/assets/Spontaneous_Dutch_NL_Gender_499fba8252.png)
![Spontaneous_Dutch_NL_Age.png](https://prdstrapimediastorage.blob.core.windows.net/prdstrapimediastorage/assets/Spontaneous_Dutch_NL_Age_afe6299b62.png)
![Spontaneous_Dutch_NL_Accents.png](https://prdstrapimediastorage.blob.core.windows.net/prdstrapimediastorage/assets/Spontaneous_Dutch_NL_Accents_44a1f1248b.png)

### Dutch Belgium
![Spontaneous_Dutch_Belgium_Accents.png](https://prdstrapimediastorage.blob.core.windows.net/prdstrapimediastorage/assets/Spontaneous_Dutch_Belgium_Accents_d0860e237e.png)
![Spontaneous_Dutch_Belgium_Age.png](https://prdstrapimediastorage.blob.core.windows.net/prdstrapimediastorage/assets/Spontaneous_Dutch_Belgium_Age_31d8b8f96e.png)
![Spontaneous_Dutch_Belgium_Gender.png](https://prdstrapimediastorage.blob.core.windows.net/prdstrapimediastorage/assets/Spontaneous_Dutch_Belgium_Gender_64543e96b6.png)

## Samples
- [5-minute audio sample of Dutch (Netherlands)](https://defineddata.blob.core.windows.net/samples/AFI_nl-nl_multi-spontaneous_banking_30m_v01_Sample/Audio/d1e6583c-2784-4313-870d-9e33945d0347.wav?se=2024-06-15T15%3A41%3A43Z&sp=r&sv=2020-06-12&ss=b&srt=o&sig=9Up/5pl4MpD43uo894zcuQeljVAjAx2WqwdPiVMCmOU%3D). Transcription for the sample is also [available](https://prdstrapimediastorage.blob.core.windows.net/prdstrapimediastorage/assets/Spontaneous_Dutch_NL_Short_Transcription_398c531ad6.tsv?updated_at=2023-06-06T14:22:28.417Z).
- [5-minute audio sample of Dutch (Belgium)](https://defineddata.blob.core.windows.net/samples/AFI_nl-be_multi-spontaneous_banking_30m_v01_Sample/Audio/a1eab93b-d54f-4096-8cdc-d261ff1ac34b.wav?se=2024-06-15T22%3A50%3A21Z&sp=r&sv=2020-06-12&ss=b&srt=o&sig=FNk66NQXoZdLBxlZaYgGVvWAM6mGKbWO3wKT8F2TZXs%3D). Transcription for the sample is also [available](https://prdstrapimediastorage.blob.core.windows.net/prdstrapimediastorage/assets/Spontaneous_Dutch_Belgium_Short_Transcription_65e0be44d3.tsv?updated_at=2023-06-06T14:39:45.723Z).

### Download 30-minutes samples of this Dutch dataset
- [AFI_nl-nl_multi-spontaneous_banking_30m_v01_Sample.zip](https://defineddata.blob.core.windows.net/samples/AFI_nl-nl_multi-spontaneous_banking_30m_v01_Sample/AFI_nl-nl_multi-spontaneous_banking_30m_v01_Sample.zip?se=2024-06-15T15%3A41%3A43Z&sp=r&sv=2020-06-12&ss=b&srt=o&sig=9Up/5pl4MpD43uo894zcuQeljVAjAx2WqwdPiVMCmOU%3D).
- [AFI_nl-be_multi-spontaneous_banking_30m_v01_Sample.zip](https://defineddata.blob.core.windows.net/samples/AFI_nl-be_multi-spontaneous_banking_30m_v01_Sample/AFI_nl-be_multi-spontaneous_banking_30m_v01_Sample.zip?se=2024-06-15T22%3A50%3A21Z&sp=r&sv=2020-06-12&ss=b&srt=o&sig=FNk66NQXoZdLBxlZaYgGVvWAM6mGKbWO3wKT8F2TZXs%3D).

Download Free 30-minute Sample

Speech

Dutch Scripted Monologue

Spanish Scripted Monologue

German Spontaneous Dialogue Dataset

English Spontaneous Dialogue

You might also be interested in:

Dutch Spontaneous Dialogue

1800 hours recorded by speakers from the Netherlands, 206 hours by speakers from Belgium