← Back to Blog
Technology··5 min read

OpenAI TTS API: Complete Developer Guide

Learn how to use OpenAI's text-to-speech API for voice generation. Includes code examples, voice options, pricing, and integration best practices.

openai ttsopenai text to speechopenai voice apiopenai audio apigpt tts

OpenAI TTS API: Complete Developer Guide

OpenAI's text-to-speech API brings the same reliability and ease of use that developers love from their other APIs. If you're already using GPT for your application, adding voice is seamless.

This guide covers everything you need to integrate OpenAI TTS into your application.

Why OpenAI TTS?

OpenAI's TTS offering has several advantages:

  • Simplicity: Same API patterns as GPT, easy to integrate
  • Reliability: OpenAI's proven infrastructure and uptime
  • Quality: High-quality voices with natural intonation
  • Integration: Perfect for apps already using OpenAI
  • Pricing: Competitive pay-per-use model

Getting Started

Authentication

Use your existing OpenAI API key:

export OPENAI_API_KEY="your-api-key"

Basic Text-to-Speech

from openai import OpenAI
 
client = OpenAI()
 
response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello, this is a test of the OpenAI TTS API."
)
 
response.stream_to_file("output.mp3")

Streaming Audio

For real-time applications:

from openai import OpenAI
 
client = OpenAI()
 
with client.audio.speech.with_streaming_response.create(
    model="tts-1",
    voice="alloy",
    input="This audio will stream as it generates."
) as response:
    response.stream_to_file("output.mp3")

Available Models

OpenAI offers two TTS models:

ModelLatencyQualityPrice
tts-1LowerGood$0.015/1K chars
tts-1-hdHigherBetter$0.030/1K chars

Model Selection

# For real-time applications
model = "tts-1"
 
# For highest quality (podcasts, audiobooks)
model = "tts-1-hd"

Voice Options

OpenAI provides six built-in voices:

VoiceDescriptionBest For
alloyNeutral, balancedGeneral purpose
echoWarm, conversationalAssistants, chat
fableExpressive, BritishNarration, storytelling
onyxDeep, authoritativeProfessional content
novaFriendly, upbeatCustomer-facing apps
shimmerClear, gentleAccessibility, instructions

Choosing a Voice

# Test different voices
voices = ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
 
for voice in voices:
    response = client.audio.speech.create(
        model="tts-1",
        voice=voice,
        input="Hello, I am the " + voice + " voice."
    )
    response.stream_to_file(f"{voice}.mp3")

Output Formats

OpenAI TTS supports multiple audio formats:

# MP3 (default)
response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello world",
    response_format="mp3"
)
 
# Other formats: opus, aac, flac, wav, pcm
response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello world",
    response_format="opus"  # Smaller file size
)

Format Recommendations

FormatUse Case
mp3General purpose, web
opusStreaming, low bandwidth
aaciOS applications
flacHigh quality archival
wavProfessional audio editing
pcmReal-time processing

Speed Control

Adjust the speaking speed:

# Slower speech (0.25 to 1.0)
response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="This will be spoken slowly.",
    speed=0.75
)
 
# Faster speech (1.0 to 4.0)
response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="This will be spoken quickly.",
    speed=1.5
)

Pricing

OpenAI uses simple per-character pricing:

ModelPrice per 1M characters
tts-1$15.00
tts-1-hd$30.00

Example costs:

  • 1,000 words (~5,000 chars): $0.075 (tts-1)
  • 10,000 words (~50,000 chars): $0.75 (tts-1)
  • Audiobook (~500,000 chars): $7.50 (tts-1)

Best Practices

Optimize for Latency

  1. Use tts-1 for real-time applications
  2. Use streaming for faster time-to-first-audio
  3. Keep requests under 4,096 characters

Improve Quality

  1. Add punctuation for natural pacing
  2. Use tts-1-hd for published content
  3. Break long text into paragraphs

Cost Optimization

  1. Cache frequently used audio
  2. Use tts-1 unless quality is critical
  3. Compress output with opus format

Integration Examples

With Chat Completions

Create a voice-enabled assistant:

from openai import OpenAI
 
client = OpenAI()
 
# Get text response
chat_response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "Explain quantum computing briefly."}
    ]
)
 
text = chat_response.choices[0].message.content
 
# Convert to speech
audio_response = client.audio.speech.create(
    model="tts-1",
    voice="nova",
    input=text
)
 
audio_response.stream_to_file("explanation.mp3")

Web Application

const response = await fetch('https://api.openai.com/v1/audio/speech', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'tts-1',
    voice: 'alloy',
    input: 'Hello from the browser!',
  }),
});
 
const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();

Limitations

Be aware of these constraints:

  • No voice cloning: Cannot create custom voices
  • No SSML: Limited prosody control
  • Character limit: 4,096 characters per request
  • No real-time: Not designed for live conversation

Comparison with Alternatives

FeatureOpenAI TTSElevenLabsGoogle Cloud
Voice QualityVery GoodExcellentVery Good
Voice CloningNoYesLimited
SSML SupportNoLimitedYes
Latency~400ms~300ms~200ms
Languages50+29+40+

For a detailed comparison, see our Voice AI API Comparison Guide.

When to Choose OpenAI TTS

OpenAI TTS is ideal when:

  • You're already using OpenAI APIs
  • You want simple, reliable TTS
  • You need quick integration
  • Budget is a consideration

Consider alternatives if:

Conclusion

OpenAI TTS offers a straightforward, high-quality text-to-speech solution that integrates seamlessly with other OpenAI services. While it lacks advanced features like voice cloning, its simplicity and reliability make it an excellent choice for many applications.


This article is part of our Voice AI API Comparison series. Explore guides for ElevenLabs, Azure Speech, and more.

Related Articles