I am a huge fan of Computer Interfaces; especially non-textual ones. Spoken words can be a powerful interface to digital services; Amazon Web Services has various services and products available that work with audio. To identify language in spoken words and extract textual information, you can use Amazon Transcribe and analize audio files.
Generate Audio Files
On macOS, you can use the say
application on the Command Line Interface for text-to-speech functionality. Of course, you can save the audio stream to a local file. Additionally, say
supports multiple languages and different voice styles as well.
First, let’s take a look at the available voice styles in English and German. To use language detection of Amazon Transcribe, we want at least two different audio files.
# Show available voice styles in English
$ > say -v \? | grep en_US
Samantha en_US # Hello! My name is Bahh.
[ … ]
# Show available voice styles in German
$ > say -v \? | grep de_DE
Anna de_DE # Hallo! Ich heiße Anna.
[ … ]
For example, pick Samantha and Anna and use a simple sentence. To mix things up, use the English voice style and include the word German; have the German voice say English in its sentence.
# Use English voice style
$ > say -v Samantha "hello world! this is not german. I hope it gets detected correctly." \
-o 001.aiff
# Use German voice style
$ > say -v Anna "hallo welt! das ist kein englisch. Ich hoffe das wird korrekt erkannt." \
-o 002.aiff
To use the audio files in Amazon Transcribe, they need to be in one of the supported file formats: AMR, FLAC, M4A, MP3, MP4, Ogg, WebM, or WAV. On macOS, you can use lime
to easily convert audio files on the Command Line Interface.
# Install lame with Homebrew
$ > brew install lame
Now, you can convert .aiff
files to .MP3:
# Convert English audio file to MP3
$ > lame -m m 001.aiff 001.mp3
# Convert German audio file to MP3
$ > lame -m m 002.aiff 002.mp3
S3 Storage for Amazon Transcribe
To have the files ready for Amazon Transcribe, they need to be stored on Amazon S3. You can use the AWS CLI to create an S3 Bucket in your local region:
# Create S3 Bucket in eu-central-1 (Frankfurt)
$ > aws s3api create-bucket \
--bucket example-amazon-transcribe-bucket-name \
--region eu-central-1 \
--create-bucket-configuration LocationConstraint=eu-central-1
{
"Location": "http://example-amazon-transcribe-bucket-name.s3.amazonaws.com/"
}
Next, upload the .MP3 files to Amazon S3:
# Copy English audio file to S3
$ > aws s3 cp 001.mp3 s3://example-amazon-transcribe-bucket-name
# Copy German audio file to S3
$ > aws s3 cp 002.mp3 s3://example-amazon-transcribe-bucket-name
Amazon Transcribe
When working with Amazon Transcribe, you use so-called Transcription Jobs to perform the required tasks. These features are available in the AWS Management Console in your web browser, the AWS CLI, and, of course, the AWS Software Development Kits.
# Start Transcription Job for English audio file
$ > aws transcribe start-transcription-job \
--region "eu-central-1" \
--media "MediaFileUri=s3://example-amazon-transcribe-bucket-name/001.mp3" \
--transcription-job-name "transcribe-001" \
--identify-language
# Start Transcription Job for German audio file
$ > aws transcribe start-transcription-job \
--region "eu-central-1" \
--media "MediaFileUri=s3://example-amazon-transcribe-bucket-name/002.mp3" \
--transcription-job-name "transcribe-002" \
--identify-language
{
"TranscriptionJob": {
"TranscriptionJobName": "transcribe-002",
"TranscriptionJobStatus": "IN_PROGRESS",
"Media": {
"MediaFileUri": "s3://example-amazon-transcribe-bucket-name/002.mp3"
},
"StartTime": "2023-12-18T19:30:09.624000+01:00",
"CreationTime": "2023-12-18T19:30:09.594000+01:00",
"IdentifyLanguage": true
}
}
When starting new jobs, they have the TranscriptionJobStatus
set to IN_PROGRESS
. Next, retrieve a job’s status using the AWS CLI again:
# Get Transcription Job details
$ > aws transcribe get-transcription-job \
--region "eu-central-1" \
--transcription-job-name transcribe-002
{
"TranscriptionJob": {
"TranscriptionJobName": "transcribe-002",
"TranscriptionJobStatus": "IN_PROGRESS",
"LanguageCode": "de-DE",
"IdentifiedLanguageScore": 0.9969637393951416,
[ … ]
}
}
If you don’t need to extract the transcribed text, you can already get the identified language for IN_PROGRESS
jobs. In addition to the language code, you also get a scoring for the accuracy of the identification.
To extract the text, wait for COMPLETED
status:
# Get Transcription Job details
$ > aws transcribe get-transcription-job \
--region "eu-central-1" \
--transcription-job-name transcribe-002
{
"TranscriptionJob": {
"TranscriptionJobName": "transcribe-002",
"TranscriptionJobStatus": "COMPLETED",
"LanguageCode": "de-DE",
"MediaSampleRateHertz": 22050,
"MediaFormat": "mp3",
"Media": {
"MediaFileUri": "s3://example-amazon-transcribe-bucket-name/002.mp3"
},
"Transcript": {
"TranscriptFileUri": "https://s3.eu-central-1.amazonaws.com/aws-transcribe-eu-central-1-prod/420048798477/8967C8C2-D397…"
},
"StartTime": "2023-12-18T19:36:39.650000+01:00",
"CreationTime": "2023-12-18T19:36:39.629000+01:00",
"CompletionTime": "2023-12-18T19:36:53.736000+01:00",
"Settings": {
"ChannelIdentification": false,
"ShowAlternatives": false
},
"IdentifyLanguage": true,
"IdentifiedLanguageScore": 0.9969637393951416
}
}
As the transcription of audio files is an asynchronous action, Amazon Transcribe stores the extracted information in a file stored on Amazon S3. The URL is available at Transcript.TranscriptFileUri
and can be downloaded using cURL
on your CLI again:
# Download Transcript.TranscriptFileUri file
$ > curl https://s3.eu-central-1.amazonaws.com/aws-transcribe-eu-central-1-prod/420048798477/8967C8C… \
-o data.json
Finally, use jq
to extract the transcribed text.
$ > cat data.json | jq -r ".results.transcripts"
[
{
"transcript": "Hallo Welt, das ist kein Englisch, ich hoffe, das wird korrekt erkannt."
}
]
BAM, that’s it! 🎉
Summary
You used Amazon Transcribe to identify language in audio files! Thanks to macOS and the available say
command line interface, you can easily generate local audio files with text-to-speech. With lame, you converted audio files to .MP3 format.
After uploading files to Amazon S3, you triggered Transcription Jobs in Amazon Transcribe to identify language and extract text from audio files. Well done!