Portuguese Scripted Monologue
About the Dataset
1091 hours
This audio dataset contains 720 hours of Portuguese Speech Data recored by native Portuguese speakers from Portugal, and 371 hours recorded by speakers from Brazil.
There are 720 hours of Portuguese (Portugal) scripted monologue, generic domain.
There are 371 hours of Portuguese (Brazil) scripted monologue with the following domain distribution per dataset:
- 56.58 hours of Banking
- 150.52 hours of Generic domain
- 69.52 hours of Insurance
- 43.12 hours of Retail
- 51.92 hours of Telecommunication
The speakers are presented with a prompt (script) and asked to read it out loud and record. Our clients will receive an audio recording, the prompt and information about the speaker. The audio is recorded on-device, typically in 16Khz 16 bit. We also provide information on which device each record was recorded.
The dataset is covered by Defined.ai's standard license agreement. The license agreement is perpetual and allows for the commercialization of all models built on the data.
Metadata Distribution
Portuguese (Portugal)
Portuguese (Brazil)
Short Audio Samples
- 5-minutes sample PT_PT. Transcription for the sample is also available
- 5-minutes sample PT_BR. Transcription for the sample is also available