Amazon Polly synthesizes speech from text (Text-to-Speech):
# Synthesize speech
aws polly synthesize-speech \
--text "Hello, how are you?" \
--output-format mp3 \
--voice-id Joanna \
--region <REGION> \
speech.mp3
# List all voices
aws polly describe-voices \
--region <REGION>
# Or filter by language
aws polly describe-voices \
--language-code en-US \
--region <REGION>
Results will include:
import boto3
class PollyService:
def __init__(self, region: str = 'ap-southeast-1'):
self.client = boto3.client('polly', region_name=region)
def synthesize_speech(self, text: str, voice_id: str = 'Joanna') -> bytes:
"""Synthesize speech from text"""
response = self.client.synthesize_speech(
Text=text,
OutputFormat='mp3',
VoiceId=voice_id,
Engine='neural' # Or 'standard'
)
return response['AudioStream'].read()
def save_speech(self, text: str, filename: str, voice_id: str = 'Joanna'):
"""Synthesize and save speech to file"""
audio = self.synthesize_speech(text, voice_id)
with open(filename, 'wb') as f:
f.write(audio)
# Get specific voice information
aws polly describe-voices \
--voice-ids Joanna \
--region <REGION>
# Or all voices
aws polly describe-voices \
--region <REGION> \
--output table
# View synthesis requests
aws cloudwatch get-metric-statistics \
--namespace AWS/Polly \
--metric-name RequestCount \
--start-time 2026-05-01T00:00:00Z \
--end-time 2026-05-02T00:00:00Z \
--period 3600 \
--statistics Sum \
--region <REGION>
# View errors
aws cloudwatch get-metric-statistics \
--namespace AWS/Polly \
--metric-name UserErrors \
--start-time 2026-05-01T00:00:00Z \
--end-time 2026-05-02T00:00:00Z \
--period 3600 \
--statistics Sum \
--region <REGION>
Tips to reduce costs:
# Use SSML for optimization
ssml_text = """
<speak>
<prosody rate="0.9">
Hello, how are you today?
</prosody>
</speak>
"""
response = client.synthesize_speech(
Text=ssml_text,
TextType='ssml',
OutputFormat='mp3',
VoiceId='Joanna'
)
Issue: Voice not available
# Check available voices
aws polly describe-voices \
--region <REGION>
# Try different voice
aws polly synthesize-speech \
--text "Hello" \
--voice-id Matthew \
--output-format mp3 \
--region <REGION> \
speech.mp3
Issue: Text too long
# Limit text length (max 3000 characters)
text = "Your text here"[:3000]
aws polly synthesize-speech \
--text "$text" \
--voice-id Joanna \
--output-format mp3 \
--region <REGION> \
speech.mp3
Issue: Unsupported format
# Supported formats: mp3, ogg, pcm, json
# Try different format
aws polly synthesize-speech \
--text "Hello" \
--voice-id Joanna \
--output-format ogg \
--region <REGION> \
speech.ogg
You have completed the AI & Voice section! Continue to CI/CD Pipeline to set up deployment automation.