OpenAI TTS API: Complete Developer Guide
Learn how to use OpenAI's text-to-speech API for voice generation. Includes code examples, voice options, pricing, and integration best practices.
OpenAI TTS API: Complete Developer Guide
OpenAI's text-to-speech API brings the same reliability and ease of use that developers love from their other APIs. If you're already using GPT for your application, adding voice is seamless.
This guide covers everything you need to integrate OpenAI TTS into your application.
Why OpenAI TTS?
OpenAI's TTS offering has several advantages:
- Simplicity: Same API patterns as GPT, easy to integrate
- Reliability: OpenAI's proven infrastructure and uptime
- Quality: High-quality voices with natural intonation
- Integration: Perfect for apps already using OpenAI
- Pricing: Competitive pay-per-use model
Getting Started
Authentication
Use your existing OpenAI API key:
export OPENAI_API_KEY="your-api-key"Basic Text-to-Speech
from openai import OpenAI
client = OpenAI()
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Hello, this is a test of the OpenAI TTS API."
)
response.stream_to_file("output.mp3")Streaming Audio
For real-time applications:
from openai import OpenAI
client = OpenAI()
with client.audio.speech.with_streaming_response.create(
model="tts-1",
voice="alloy",
input="This audio will stream as it generates."
) as response:
response.stream_to_file("output.mp3")Available Models
OpenAI offers two TTS models:
| Model | Latency | Quality | Price |
|---|---|---|---|
| tts-1 | Lower | Good | $0.015/1K chars |
| tts-1-hd | Higher | Better | $0.030/1K chars |
Model Selection
# For real-time applications
model = "tts-1"
# For highest quality (podcasts, audiobooks)
model = "tts-1-hd"Voice Options
OpenAI provides six built-in voices:
| Voice | Description | Best For |
|---|---|---|
| alloy | Neutral, balanced | General purpose |
| echo | Warm, conversational | Assistants, chat |
| fable | Expressive, British | Narration, storytelling |
| onyx | Deep, authoritative | Professional content |
| nova | Friendly, upbeat | Customer-facing apps |
| shimmer | Clear, gentle | Accessibility, instructions |
Choosing a Voice
# Test different voices
voices = ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
for voice in voices:
response = client.audio.speech.create(
model="tts-1",
voice=voice,
input="Hello, I am the " + voice + " voice."
)
response.stream_to_file(f"{voice}.mp3")Output Formats
OpenAI TTS supports multiple audio formats:
# MP3 (default)
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Hello world",
response_format="mp3"
)
# Other formats: opus, aac, flac, wav, pcm
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Hello world",
response_format="opus" # Smaller file size
)Format Recommendations
| Format | Use Case |
|---|---|
| mp3 | General purpose, web |
| opus | Streaming, low bandwidth |
| aac | iOS applications |
| flac | High quality archival |
| wav | Professional audio editing |
| pcm | Real-time processing |
Speed Control
Adjust the speaking speed:
# Slower speech (0.25 to 1.0)
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="This will be spoken slowly.",
speed=0.75
)
# Faster speech (1.0 to 4.0)
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="This will be spoken quickly.",
speed=1.5
)Pricing
OpenAI uses simple per-character pricing:
| Model | Price per 1M characters |
|---|---|
| tts-1 | $15.00 |
| tts-1-hd | $30.00 |
Example costs:
- 1,000 words (~5,000 chars): $0.075 (tts-1)
- 10,000 words (~50,000 chars): $0.75 (tts-1)
- Audiobook (~500,000 chars): $7.50 (tts-1)
Best Practices
Optimize for Latency
- Use
tts-1for real-time applications - Use streaming for faster time-to-first-audio
- Keep requests under 4,096 characters
Improve Quality
- Add punctuation for natural pacing
- Use
tts-1-hdfor published content - Break long text into paragraphs
Cost Optimization
- Cache frequently used audio
- Use
tts-1unless quality is critical - Compress output with opus format
Integration Examples
With Chat Completions
Create a voice-enabled assistant:
from openai import OpenAI
client = OpenAI()
# Get text response
chat_response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": "Explain quantum computing briefly."}
]
)
text = chat_response.choices[0].message.content
# Convert to speech
audio_response = client.audio.speech.create(
model="tts-1",
voice="nova",
input=text
)
audio_response.stream_to_file("explanation.mp3")Web Application
const response = await fetch('https://api.openai.com/v1/audio/speech', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'tts-1',
voice: 'alloy',
input: 'Hello from the browser!',
}),
});
const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();Limitations
Be aware of these constraints:
- No voice cloning: Cannot create custom voices
- No SSML: Limited prosody control
- Character limit: 4,096 characters per request
- No real-time: Not designed for live conversation
Comparison with Alternatives
| Feature | OpenAI TTS | ElevenLabs | Google Cloud |
|---|---|---|---|
| Voice Quality | Very Good | Excellent | Very Good |
| Voice Cloning | No | Yes | Limited |
| SSML Support | No | Limited | Yes |
| Latency | ~400ms | ~300ms | ~200ms |
| Languages | 50+ | 29+ | 40+ |
For a detailed comparison, see our Voice AI API Comparison Guide.
When to Choose OpenAI TTS
OpenAI TTS is ideal when:
- You're already using OpenAI APIs
- You want simple, reliable TTS
- You need quick integration
- Budget is a consideration
Consider alternatives if:
- You need voice cloning (ElevenLabs)
- You need fine-grained SSML control (Google Cloud)
- You need lowest latency (Amazon Polly)
Conclusion
OpenAI TTS offers a straightforward, high-quality text-to-speech solution that integrates seamlessly with other OpenAI services. While it lacks advanced features like voice cloning, its simplicity and reliability make it an excellent choice for many applications.
This article is part of our Voice AI API Comparison series. Explore guides for ElevenLabs, Azure Speech, and more.