OpenAI TTS API: Complete Developer Guide

OpenAI's text-to-speech API brings the same reliability and ease of use that developers love from their other APIs. If you're already using GPT for your application, adding voice is seamless.

This guide covers everything you need to integrate OpenAI TTS into your application.

Why OpenAI TTS?

OpenAI's TTS offering has several advantages:

Simplicity: Same API patterns as GPT, easy to integrate
Reliability: OpenAI's proven infrastructure and uptime
Quality: High-quality voices with natural intonation
Integration: Perfect for apps already using OpenAI
Pricing: Competitive pay-per-use model

Getting Started

Authentication

Use your existing OpenAI API key:

export OPENAI_API_KEY="your-api-key"

Basic Text-to-Speech

from openai import OpenAI
 
client = OpenAI()
 
response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello, this is a test of the OpenAI TTS API."
)
 
response.stream_to_file("output.mp3")

Streaming Audio

For real-time applications:

from openai import OpenAI
 
client = OpenAI()
 
with client.audio.speech.with_streaming_response.create(
    model="tts-1",
    voice="alloy",
    input="This audio will stream as it generates."
) as response:
    response.stream_to_file("output.mp3")

Available Models

OpenAI offers two TTS models:

Model	Latency	Quality	Price
tts-1	Lower	Good	$0.015/1K chars
tts-1-hd	Higher	Better	$0.030/1K chars

Model Selection

# For real-time applications
model = "tts-1"
 
# For highest quality (podcasts, audiobooks)
model = "tts-1-hd"

Voice Options

OpenAI provides six built-in voices:

Voice	Description	Best For
alloy	Neutral, balanced	General purpose
echo	Warm, conversational	Assistants, chat
fable	Expressive, British	Narration, storytelling
onyx	Deep, authoritative	Professional content
nova	Friendly, upbeat	Customer-facing apps
shimmer	Clear, gentle	Accessibility, instructions

Choosing a Voice

# Test different voices
voices = ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
 
for voice in voices:
    response = client.audio.speech.create(
        model="tts-1",
        voice=voice,
        input="Hello, I am the " + voice + " voice."
    )
    response.stream_to_file(f"{voice}.mp3")

Output Formats

OpenAI TTS supports multiple audio formats:

# MP3 (default)
response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello world",
    response_format="mp3"
)
 
# Other formats: opus, aac, flac, wav, pcm
response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello world",
    response_format="opus"  # Smaller file size
)

Format Recommendations

Format	Use Case
mp3	General purpose, web
opus	Streaming, low bandwidth
aac	iOS applications
flac	High quality archival
wav	Professional audio editing
pcm	Real-time processing

Speed Control

Adjust the speaking speed:

# Slower speech (0.25 to 1.0)
response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="This will be spoken slowly.",
    speed=0.75
)
 
# Faster speech (1.0 to 4.0)
response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="This will be spoken quickly.",
    speed=1.5
)

Pricing

OpenAI uses simple per-character pricing:

Model	Price per 1M characters
tts-1	$15.00
tts-1-hd	$30.00

Example costs:

1,000 words (~5,000 chars): $0.075 (tts-1)
10,000 words (~50,000 chars): $0.75 (tts-1)
Audiobook (~500,000 chars): $7.50 (tts-1)

Best Practices

Optimize for Latency

Use tts-1 for real-time applications
Use streaming for faster time-to-first-audio
Keep requests under 4,096 characters

Improve Quality

Add punctuation for natural pacing
Use tts-1-hd for published content
Break long text into paragraphs

Cost Optimization

Cache frequently used audio
Use tts-1 unless quality is critical
Compress output with opus format

Integration Examples

With Chat Completions

Create a voice-enabled assistant:

from openai import OpenAI
 
client = OpenAI()
 
# Get text response
chat_response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "Explain quantum computing briefly."}
    ]
)
 
text = chat_response.choices[0].message.content
 
# Convert to speech
audio_response = client.audio.speech.create(
    model="tts-1",
    voice="nova",
    input=text
)
 
audio_response.stream_to_file("explanation.mp3")

Web Application

const response = await fetch('https://api.openai.com/v1/audio/speech', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'tts-1',
    voice: 'alloy',
    input: 'Hello from the browser!',
  }),
});
 
const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();

Limitations

Be aware of these constraints:

No voice cloning: Cannot create custom voices
No SSML: Limited prosody control
Character limit: 4,096 characters per request
No real-time: Not designed for live conversation

Comparison with Alternatives

Feature	OpenAI TTS	ElevenLabs	Google Cloud
Voice Quality	Very Good	Excellent	Very Good
Voice Cloning	No	Yes	Limited
SSML Support	No	Limited	Yes
Latency	~400ms	~300ms	~200ms
Languages	50+	29+	40+

For a detailed comparison, see our Voice AI API Comparison Guide.

When to Choose OpenAI TTS

OpenAI TTS is ideal when:

You're already using OpenAI APIs
You want simple, reliable TTS
You need quick integration
Budget is a consideration

Consider alternatives if:

You need voice cloning (ElevenLabs)
You need fine-grained SSML control (Google Cloud)
You need lowest latency (Amazon Polly)

Conclusion

OpenAI TTS offers a straightforward, high-quality text-to-speech solution that integrates seamlessly with other OpenAI services. While it lacks advanced features like voice cloning, its simplicity and reliability make it an excellent choice for many applications.

This article is part of our Voice AI API Comparison series. Explore guides for ElevenLabs, Azure Speech, and more.

OpenAI TTS API: Complete Developer Guide

OpenAI TTS API: Complete Developer Guide

Why OpenAI TTS?

Getting Started

Authentication

Basic Text-to-Speech

Streaming Audio

Available Models

Model Selection

Voice Options

Choosing a Voice

Output Formats

Format Recommendations

Speed Control

Pricing

Best Practices

Optimize for Latency

Improve Quality

Cost Optimization

Integration Examples

With Chat Completions

Web Application

Limitations

Comparison with Alternatives

When to Choose OpenAI TTS

Conclusion

Related Articles

Best Voice AI APIs in 2025: Complete Comparison Guide

ElevenLabs Voice API: Complete Developer Guide

Azure Speech Service: Complete Developer Guide