Amazon Polly: Complete Developer Guide

Amazon Polly is AWS's text-to-speech service, known for its low latency, extensive language support, and deep integration with the AWS ecosystem. If you're building on AWS, Polly is a natural choice.

This guide covers everything you need to integrate Amazon Polly into your application.

Why Amazon Polly?

Amazon Polly stands out for several reasons:

Low Latency: Fastest response times among major providers (~100ms)
Neural Voices: High-quality neural TTS with natural intonation
Speaking Styles: Newscaster and conversational styles
AWS Integration: Seamless with Lambda, S3, and other services
SSML Support: Full control over speech synthesis

Getting Started

Setup

Create an AWS account
Create IAM credentials with Polly access
Configure AWS CLI or SDK

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-east-1"

Basic Text-to-Speech

import boto3
 
polly = boto3.client('polly')
 
response = polly.synthesize_speech(
    Text="Hello, this is Amazon Polly.",
    OutputFormat="mp3",
    VoiceId="Joanna",
    Engine="neural"
)
 
with open("output.mp3", "wb") as f:
    f.write(response['AudioStream'].read())

Streaming Synthesis

For real-time applications:

import boto3
 
polly = boto3.client('polly')
 
response = polly.synthesize_speech(
    Text="This audio streams directly from Polly.",
    OutputFormat="pcm",
    VoiceId="Joanna",
    Engine="neural"
)
 
# Stream to audio player
audio_stream = response['AudioStream']

Voice Types

Amazon Polly offers two engine types:

Engine	Quality	Latency	Price
Standard	Good	~50ms	$4/1M chars
Neural	Very Good	~100ms	$16/1M chars

Standard Voices

Fast and cost-effective:

response = polly.synthesize_speech(
    Text="Using standard voice.",
    OutputFormat="mp3",
    VoiceId="Joanna",
    Engine="standard"
)

Neural Voices

Higher quality with natural intonation:

response = polly.synthesize_speech(
    Text="Using neural voice.",
    OutputFormat="mp3",
    VoiceId="Joanna",
    Engine="neural"
)

Speaking Styles

Polly offers unique speaking styles for certain voices:

Newscaster Style

Perfect for news, podcasts, and professional content:

ssml = """
<speak>
    <amazon:domain name="news">
        Today's top story: AI voice technology continues to advance rapidly.
    </amazon:domain>
</speak>
"""
 
response = polly.synthesize_speech(
    TextType="ssml",
    Text=ssml,
    OutputFormat="mp3",
    VoiceId="Matthew",
    Engine="neural"
)

Conversational Style

Natural dialogue for assistants:

ssml = """
<speak>
    <amazon:domain name="conversational">
        Hey there! How can I help you today?
    </amazon:domain>
</speak>
"""

Supported voices for styles:

Newscaster: Matthew, Joanna, Lupe (US), Amy (UK)
Conversational: Matthew, Joanna

SSML Support

Amazon Polly has comprehensive SSML support.

Basic SSML

ssml = """
<speak>
    Hello! <break time="500ms"/>
    Welcome to <emphasis level="strong">Amazon Polly</emphasis>.
</speak>
"""
 
response = polly.synthesize_speech(
    TextType="ssml",
    Text=ssml,
    OutputFormat="mp3",
    VoiceId="Joanna",
    Engine="neural"
)

Pronunciation Control

ssml = """
<speak>
    You say <phoneme alphabet="ipa" ph="təˈmeɪtoʊ">tomato</phoneme>,
    I say <phoneme alphabet="ipa" ph="təˈmɑːtəʊ">tomato</phoneme>.
</speak>
"""

Prosody Control

ssml = """
<speak>
    <prosody rate="slow" pitch="-10%">
        Speaking slowly with a lower pitch.
    </prosody>
    <prosody rate="fast" pitch="+10%">
        Speaking quickly with a higher pitch.
    </prosody>
</speak>
"""

Whispered Speech

ssml = """
<speak>
    <amazon:effect name="whispered">
        This is a secret message.
    </amazon:effect>
</speak>
"""

Long-Form Synthesis

For content longer than the API limit, use asynchronous synthesis:

import boto3
import time
 
polly = boto3.client('polly')
s3 = boto3.client('s3')
 
# Start async task
response = polly.start_speech_synthesis_task(
    Text="Very long text content here...",
    OutputFormat="mp3",
    VoiceId="Joanna",
    Engine="neural",
    OutputS3BucketName="your-bucket",
    OutputS3KeyPrefix="audio/"
)
 
task_id = response['SynthesisTask']['TaskId']
 
# Poll for completion
while True:
    task = polly.get_speech_synthesis_task(TaskId=task_id)
    status = task['SynthesisTask']['TaskStatus']
 
    if status == 'completed':
        output_uri = task['SynthesisTask']['OutputUri']
        print(f"Audio available at: {output_uri}")
        break
    elif status == 'failed':
        print("Task failed")
        break
 
    time.sleep(5)

Speech Marks

Track word timing in generated audio:

response = polly.synthesize_speech(
    Text="Hello, this is Amazon Polly speaking.",
    OutputFormat="json",
    VoiceId="Joanna",
    Engine="neural",
    SpeechMarkTypes=["word", "sentence"]
)
 
# Parse speech marks
import json
marks = [json.loads(line) for line in response['AudioStream'].read().decode().strip().split('\n')]
 
for mark in marks:
    print(f"{mark['type']}: '{mark.get('value', '')}' at {mark['time']}ms")

Pricing

Amazon Polly uses character-based pricing:

Engine	Price per 1M characters
Standard	$4.00
Neural	$16.00

Free Tier

5 million characters/month (Standard)
1 million characters/month (Neural)
Valid for 12 months from signup

AWS Integration Examples

Lambda Function

import boto3
import base64
 
def lambda_handler(event, context):
    polly = boto3.client('polly')
 
    text = event.get('text', 'Hello from Lambda!')
 
    response = polly.synthesize_speech(
        Text=text,
        OutputFormat="mp3",
        VoiceId="Joanna",
        Engine="neural"
    )
 
    audio_base64 = base64.b64encode(
        response['AudioStream'].read()
    ).decode('utf-8')
 
    return {
        'statusCode': 200,
        'headers': {'Content-Type': 'audio/mpeg'},
        'body': audio_base64,
        'isBase64Encoded': True
    }

S3 + CloudFront

Store and serve generated audio:

import boto3
import hashlib
 
polly = boto3.client('polly')
s3 = boto3.client('s3')
 
def get_or_create_audio(text, voice_id="Joanna"):
    # Generate cache key
    cache_key = hashlib.md5(f"{text}{voice_id}".encode()).hexdigest()
    s3_key = f"audio/{cache_key}.mp3"
 
    # Check if exists
    try:
        s3.head_object(Bucket="your-bucket", Key=s3_key)
        return f"https://your-cdn.cloudfront.net/{s3_key}"
    except:
        pass
 
    # Generate new audio
    response = polly.synthesize_speech(
        Text=text,
        OutputFormat="mp3",
        VoiceId=voice_id,
        Engine="neural"
    )
 
    # Upload to S3
    s3.put_object(
        Bucket="your-bucket",
        Key=s3_key,
        Body=response['AudioStream'].read(),
        ContentType="audio/mpeg"
    )
 
    return f"https://your-cdn.cloudfront.net/{s3_key}"

Language Support

Amazon Polly supports 30+ languages:

Language	Voices	Neural Support
English (US)	8+	Yes
English (UK)	4+	Yes
Spanish	6+	Yes
French	4+	Yes
German	4+	Yes
Japanese	2+	Yes
Portuguese	4+	Yes

List Available Voices

response = polly.describe_voices(LanguageCode="en-US")
 
for voice in response['Voices']:
    engines = voice.get('SupportedEngines', [])
    print(f"{voice['Id']} - {voice['Gender']} - Engines: {engines}")

Comparison with Alternatives

Feature	Amazon Polly	Google Cloud	OpenAI TTS
Voice Quality	Good	Very Good	Very Good
Latency	~100ms	~200ms	~400ms
Speaking Styles	Yes	No	No
SSML Support	Full	Full	None
Free Tier	Generous	Generous	None

For a detailed comparison, see our Voice AI API Comparison Guide.

When to Choose Amazon Polly

Amazon Polly is ideal when:

Low latency is your priority
You're building on AWS
You need speaking styles (newscaster, conversational)
Cost efficiency at scale matters

Consider alternatives if:

Voice quality is the top priority (ElevenLabs)
You need maximum language coverage (Azure Speech)
You want the simplest integration (OpenAI TTS)

Conclusion

Amazon Polly offers the lowest latency among major TTS providers, with strong AWS integration and unique speaking styles. While voice quality trails ElevenLabs, it's excellent for real-time applications and AWS-native projects.

This article is part of our Voice AI API Comparison series. Explore guides for Google Cloud TTS, Azure Speech, and more.

Amazon Polly: Complete Developer Guide

Amazon Polly: Complete Developer Guide

Why Amazon Polly?

Getting Started

Setup

Basic Text-to-Speech

Streaming Synthesis

Voice Types

Standard Voices

Neural Voices

Speaking Styles

Newscaster Style

Conversational Style

SSML Support

Basic SSML

Pronunciation Control

Prosody Control

Whispered Speech

Long-Form Synthesis

Speech Marks

Pricing

Free Tier

AWS Integration Examples

Lambda Function

S3 + CloudFront

Language Support

List Available Voices

Comparison with Alternatives

When to Choose Amazon Polly

Conclusion

Related Articles

Best Voice AI APIs in 2025: Complete Comparison Guide

Google Cloud Text-to-Speech: Complete Developer Guide

Azure Speech Service: Complete Developer Guide