Internship Report > Workshop > AI & Voice Integration > Amazon Polly

Amazon Polly

Overview

Amazon Polly synthesizes speech from text (Text-to-Speech):

Voices: Multiple English voices (Joanna, Matthew, etc.)
Output: MP3, OGG, PCM
Speed: Real-time synthesis

Synthesize Speech

# Synthesize speech
aws polly synthesize-speech \
  --text "Hello, how are you?" \
  --output-format mp3 \
  --voice-id Joanna \
  --region <REGION> \
  speech.mp3

List Available Voices

# List all voices
aws polly describe-voices \
  --region <REGION>

# Or filter by language
aws polly describe-voices \
  --language-code en-US \
  --region <REGION>

Results will include:

VoiceId (Joanna, Matthew, etc.)
Name, LanguageCode
Gender, SupportedEngines

Polly API Example (Python)

import boto3

class PollyService:
    def __init__(self, region: str = 'ap-southeast-1'):
        self.client = boto3.client('polly', region_name=region)
    
    def synthesize_speech(self, text: str, voice_id: str = 'Joanna') -> bytes:
        """Synthesize speech from text"""
        
        response = self.client.synthesize_speech(
            Text=text,
            OutputFormat='mp3',
            VoiceId=voice_id,
            Engine='neural'  # Or 'standard'
        )
        
        return response['AudioStream'].read()
    
    def save_speech(self, text: str, filename: str, voice_id: str = 'Joanna'):
        """Synthesize and save speech to file"""
        
        audio = self.synthesize_speech(text, voice_id)
        
        with open(filename, 'wb') as f:
            f.write(audio)

Describe Voices

# Get specific voice information
aws polly describe-voices \
  --voice-ids Joanna \
  --region <REGION>

# Or all voices
aws polly describe-voices \
  --region <REGION> \
  --output table

Monitor Polly Metrics

# View synthesis requests
aws cloudwatch get-metric-statistics \
  --namespace AWS/Polly \
  --metric-name RequestCount \
  --start-time 2026-05-01T00:00:00Z \
  --end-time 2026-05-02T00:00:00Z \
  --period 3600 \
  --statistics Sum \
  --region <REGION>

# View errors
aws cloudwatch get-metric-statistics \
  --namespace AWS/Polly \
  --metric-name UserErrors \
  --start-time 2026-05-01T00:00:00Z \
  --end-time 2026-05-02T00:00:00Z \
  --period 3600 \
  --statistics Sum \
  --region <REGION>

Cost Optimization

Tips to reduce costs:

Use standard engine instead of neural
Cache audio files
Use SSML for optimization
Limit text length

SSML Support

# Use SSML for optimization
ssml_text = """
<speak>
  <prosody rate="0.9">
    Hello, how are you today?
  </prosody>
</speak>
"""

response = client.synthesize_speech(
    Text=ssml_text,
    TextType='ssml',
    OutputFormat='mp3',
    VoiceId='Joanna'
)

Troubleshooting

Issue: Voice not available

# Check available voices
aws polly describe-voices \
  --region <REGION>

# Try different voice
aws polly synthesize-speech \
  --text "Hello" \
  --voice-id Matthew \
  --output-format mp3 \
  --region <REGION> \
  speech.mp3

Issue: Text too long

# Limit text length (max 3000 characters)
text = "Your text here"[:3000]

aws polly synthesize-speech \
  --text "$text" \
  --voice-id Joanna \
  --output-format mp3 \
  --region <REGION> \
  speech.mp3

Issue: Unsupported format

# Supported formats: mp3, ogg, pcm, json
# Try different format
aws polly synthesize-speech \
  --text "Hello" \
  --voice-id Joanna \
  --output-format ogg \
  --region <REGION> \
  speech.ogg

Next Steps

You have completed the AI & Voice section! Continue to CI/CD Pipeline to set up deployment automation.

Checklist

Polly voices available
Speech synthesis test successful
Audio file created
SSML support working
Cost optimization applied