AWS: Identify Language in Audio Files with Amazon Transcribe

December 18th, 2023 860 Words

I am a huge fan of Computer Interfaces; especially non-textual ones. Spoken words can be a powerful interface to digital services; Amazon Web Services has various services and products available that work with audio. To identify language in spoken words and extract textual information, you can use Amazon Transcribe and analize audio files.

Generate Audio Files

On macOS, you can use the say application on the Command Line Interface for text-to-speech functionality. Of course, you can save the audio stream to a local file. Additionally, say supports multiple languages and different voice styles as well.

First, let’s take a look at the available voice styles in English and German. To use language detection of Amazon Transcribe, we want at least two different audio files.

# Show available voice styles in English
$ > say -v \? | grep en_US

Samantha            en_US    # Hello! My name is Bahh.

[]
# Show available voice styles in German
$ > say -v \? | grep de_DE

Anna                de_DE    # Hallo! Ich heiße Anna.

[]

For example, pick Samantha and Anna and use a simple sentence. To mix things up, use the English voice style and include the word German; have the German voice say English in its sentence.

# Use English voice style
$ > say -v Samantha "hello world! this is not german. I hope it gets detected correctly." \
    -o 001.aiff
	
# Use German voice style
$ > say -v Anna "hallo welt! das ist kein englisch. Ich hoffe das wird korrekt erkannt." \
    -o 002.aiff

To use the audio files in Amazon Transcribe, they need to be in one of the supported file formats: AMR, FLAC, M4A, MP3, MP4, Ogg, WebM, or WAV. On macOS, you can use lime to easily convert audio files on the Command Line Interface.

# Install lame with Homebrew
$ > brew install lame

Now, you can convert .aiff files to .MP3:

# Convert English audio file to MP3
$ > lame -m m 001.aiff 001.mp3

# Convert German audio file to MP3
$ > lame -m m 002.aiff 002.mp3

S3 Storage for Amazon Transcribe

To have the files ready for Amazon Transcribe, they need to be stored on Amazon S3. You can use the AWS CLI to create an S3 Bucket in your local region:

# Create S3 Bucket in eu-central-1 (Frankfurt)
$ > aws s3api create-bucket \
	--bucket example-amazon-transcribe-bucket-name \
	--region eu-central-1 \
	--create-bucket-configuration LocationConstraint=eu-central-1
	
{
    "Location": "http://example-amazon-transcribe-bucket-name.s3.amazonaws.com/"
}

Next, upload the .MP3 files to Amazon S3:

# Copy English audio file to S3
$ > aws s3 cp 001.mp3 s3://example-amazon-transcribe-bucket-name

# Copy German audio file to S3
$ > aws s3 cp 002.mp3 s3://example-amazon-transcribe-bucket-name

Amazon Transcribe

When working with Amazon Transcribe, you use so-called Transcription Jobs to perform the required tasks. These features are available in the AWS Management Console in your web browser, the AWS CLI, and, of course, the AWS Software Development Kits.

# Start Transcription Job for English audio file
$ > aws transcribe start-transcription-job \
    --region "eu-central-1" \
    --media "MediaFileUri=s3://example-amazon-transcribe-bucket-name/001.mp3" \
    --transcription-job-name "transcribe-001" \
    --identify-language

# Start Transcription Job for German audio file
$ > aws transcribe start-transcription-job \
    --region "eu-central-1" \
    --media "MediaFileUri=s3://example-amazon-transcribe-bucket-name/002.mp3" \
    --transcription-job-name "transcribe-002" \
    --identify-language

{
    "TranscriptionJob": {
        "TranscriptionJobName": "transcribe-002",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "Media": {
            "MediaFileUri": "s3://example-amazon-transcribe-bucket-name/002.mp3"
        },
        "StartTime": "2023-12-18T19:30:09.624000+01:00",
        "CreationTime": "2023-12-18T19:30:09.594000+01:00",
        "IdentifyLanguage": true
    }
}

When starting new jobs, they have the TranscriptionJobStatus set to IN_PROGRESS. Next, retrieve a job’s status using the AWS CLI again:

# Get Transcription Job details
$ > aws transcribe get-transcription-job \
    --region "eu-central-1" \
    --transcription-job-name transcribe-002
{
    "TranscriptionJob": {
        "TranscriptionJobName": "transcribe-002",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "de-DE",
        "IdentifiedLanguageScore": 0.9969637393951416,

        []
    }
}

If you don’t need to extract the transcribed text, you can already get the identified language for IN_PROGRESS jobs. In addition to the language code, you also get a scoring for the accuracy of the identification.

To extract the text, wait for COMPLETED status:

# Get Transcription Job details
$ > aws transcribe get-transcription-job \
    --region "eu-central-1" \
    --transcription-job-name transcribe-002

{
    "TranscriptionJob": {
        "TranscriptionJobName": "transcribe-002",
        "TranscriptionJobStatus": "COMPLETED",
        "LanguageCode": "de-DE",
        "MediaSampleRateHertz": 22050,
        "MediaFormat": "mp3",
        "Media": {
            "MediaFileUri": "s3://example-amazon-transcribe-bucket-name/002.mp3"
        },
        "Transcript": {
            "TranscriptFileUri": "https://s3.eu-central-1.amazonaws.com/aws-transcribe-eu-central-1-prod/420048798477/8967C8C2-D397…"
        },
        "StartTime": "2023-12-18T19:36:39.650000+01:00",
        "CreationTime": "2023-12-18T19:36:39.629000+01:00",
        "CompletionTime": "2023-12-18T19:36:53.736000+01:00",
        "Settings": {
            "ChannelIdentification": false,
            "ShowAlternatives": false
        },
        "IdentifyLanguage": true,
        "IdentifiedLanguageScore": 0.9969637393951416
    }
}

As the transcription of audio files is an asynchronous action, Amazon Transcribe stores the extracted information in a file stored on Amazon S3. The URL is available at Transcript.TranscriptFileUri and can be downloaded using cURL on your CLI again:

# Download Transcript.TranscriptFileUri file
$ > curl https://s3.eu-central-1.amazonaws.com/aws-transcribe-eu-central-1-prod/420048798477/8967C8C… \
    -o data.json

Finally, use jq to extract the transcribed text.

$ > cat data.json | jq -r ".results.transcripts"

[
  {
    "transcript": "Hallo Welt, das ist kein Englisch, ich hoffe, das wird korrekt erkannt."
  }
]

BAM, that’s it! 🎉

Summary

You used Amazon Transcribe to identify language in audio files! Thanks to macOS and the available say command line interface, you can easily generate local audio files with text-to-speech. With lame, you converted audio files to .MP3 format.

After uploading files to Amazon S3, you triggered Transcription Jobs in Amazon Transcribe to identify language and extract text from audio files. Well done!