Integrate voice for speaking practice: streaming transcription (browser → Transcribe WebSocket), AI response generation, and text-to-speech.
| Day | Task | Complete |
|---|---|---|
| Mon | Transcribe Streaming: - Presigned WebSocket URL generator (backend) - Browser → Transcribe direct connection - Real-time transcription handling | ✅ |
| Tue | Polly TTS: - Neural voices integration (Joanna, Matthew) - S3 storage + presigned URLs - Audio synthesis endpoint | ✅ |
| Wed | Voice UI: - Audio recording (MediaRecorder API) - Streaming transcription integration - Audio playback components - Recording controls (start/stop/preview) | ✅ |
| Thu | Voice Integration: - Connect recording → transcription → AI → TTS flow - Error handling for connection issues - Audio format compatibility | ✅ |
| Fri | Testing: - Test voice recording flow - Test transcription accuracy - Test audio playback - Cross-browser testing (Chrome, Safari) | ✅ |
1. Streaming Transcription:
2. Text-to-Speech:
3. Voice UI:
4. Testing:
Next: Sprint 4 - Final testing, deployment, demo