PersonaPlex-7B: Full-Duplex Speech-to-Speech Model Guide
Complete guide to PersonaPlex-7B, the open source speech-to-speech model for building conversational AI. Learn deployment, optimization, and best practices.
PersonaPlex-7B: Full-Duplex Speech-to-Speech Model Guide
PersonaPlex-7B is an open source speech-to-speech model designed for real-time conversational AI. Unlike traditional TTS pipelines, it processes speech end-to-end, enabling natural full-duplex conversations with interruption handling.
Overview
PersonaPlex-7B represents a new approach to voice AI:
- Speech-to-speech: Direct audio processing without intermediate text
- Full-duplex: Listen and respond simultaneously
- Interruption handling: Natural turn-taking and barge-in support
- Context awareness: Maintains conversation history and emotional state
- 7B parameters: Optimized for real-time inference
Architecture
┌─────────────┐ ┌──────────────────┐ ┌─────────────┐
│ User Speech │────►│ PersonaPlex-7B │────►│ AI Response │
│ (Audio) │◄────│ Speech-to-Speech │◄────│ (Audio) │
└─────────────┘ └──────────────────┘ └─────────────┘
▲ │ │
│ ▼ │
│ ┌────────────────┐ │
└───────────│ Full-Duplex │◄─────────────┘
│ Stream Manager │
└────────────────┘
Hardware Requirements
| Configuration | VRAM | Latency | Concurrent Users |
|---|---|---|---|
| Minimum | 16GB | ~500ms | 1 |
| Recommended | 24GB | ~300ms | 2-4 |
| Production | 40GB+ | ~200ms | 8+ |
Quick Start
Installation
pip install personaplex-7bBasic Usage
from personaplex import PersonaPlex7B
# Initialize the model
model = PersonaPlex7B.from_pretrained("personaplex/personaplex-7b")
# Create a conversation session
session = model.create_session()
# Process audio input and get response
response_audio = session.process(input_audio)Streaming Conversation
import asyncio
from personaplex import PersonaPlex7B, AudioStream
async def conversation():
model = PersonaPlex7B.from_pretrained("personaplex/personaplex-7b")
session = model.create_session()
# Create bidirectional audio streams
input_stream = AudioStream.from_microphone()
output_stream = AudioStream.to_speaker()
# Run full-duplex conversation
async for response_chunk in session.stream(input_stream):
await output_stream.write(response_chunk)
asyncio.run(conversation())Full-Duplex Features
Interruption Handling
PersonaPlex-7B naturally handles interruptions:
session = model.create_session(
interruption_threshold=0.3, # Sensitivity (0-1)
fade_on_interrupt=True, # Gracefully fade out
interrupt_response="adaptive" # or "immediate", "delayed"
)Turn-Taking
Configure natural conversation flow:
session = model.create_session(
end_of_turn_detection="auto", # Automatic pause detection
min_response_delay=100, # ms before responding
backchanneling=True # Enable "uh-huh", "I see" etc.
)Voice Configuration
Built-in Voices
# List available voices
voices = model.list_voices()
# ['aria', 'marcus', 'elena', 'kai', ...]
# Use a specific voice
session = model.create_session(voice="aria")Voice Cloning
Clone a voice for personalized agents:
custom_voice = model.clone_voice(
reference_audio="agent_voice.wav",
name="my_agent"
)
session = model.create_session(voice=custom_voice)System Prompts
Guide the AI's behavior with system prompts:
session = model.create_session(
system_prompt="""You are a helpful customer support agent for TechCorp.
Be concise, friendly, and solution-oriented.
If you don't know something, offer to connect the user with a human agent."""
)WebSocket Server
Deploy as a real-time WebSocket server:
from personaplex import PersonaPlex7B, WebSocketServer
model = PersonaPlex7B.from_pretrained("personaplex/personaplex-7b")
server = WebSocketServer(model, port=8765)
# Start serving
server.run()Client connection:
const ws = new WebSocket('ws://localhost:8765');
const mediaRecorder = new MediaRecorder(audioStream);
mediaRecorder.ondataavailable = (e) => ws.send(e.data);
ws.onmessage = (e) => playAudio(e.data);Production Deployment
Docker Compose
version: '3.8'
services:
personaplex:
image: personaplex/personaplex-7b:latest
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
- MODEL_CACHE=/models
volumes:
- ./models:/models
ports:
- "8765:8765"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]Load Balancing
For high-traffic deployments:
services:
personaplex:
deploy:
replicas: 4
# ... rest of config
nginx:
image: nginx
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.confPerformance Optimization
Quantization
Reduce memory usage with quantization:
model = PersonaPlex7B.from_pretrained(
"personaplex/personaplex-7b",
quantization="int8" # ~50% memory reduction
)Speculative Decoding
Enable faster inference:
model = PersonaPlex7B.from_pretrained(
"personaplex/personaplex-7b",
speculative_decoding=True,
draft_model="personaplex/personaplex-draft"
)Comparison with Alternatives
| Feature | PersonaPlex-7B | Moshi | GPT-4o Realtime |
|---|---|---|---|
| Open Source | Yes | Yes | No |
| Full-Duplex | Yes | Yes | Yes |
| Self-hostable | Yes | Yes | No |
| Voice Cloning | Yes | Limited | No |
| Languages | English | EN/FR | 50+ |
For more options, see our Open Source Voice AI Models comparison.
When to Use PersonaPlex-7B
Choose PersonaPlex-7B when you need:
- Full-duplex conversational AI
- Self-hosted deployment with data privacy
- Custom voice cloning
- Natural interruption handling
Consider alternatives when you need:
- Multilingual support → Qwen3-TTS
- Lowest latency → Moshi
- Simple TTS → Fish Speech
- Managed infrastructure → PersonaPlex API
Conclusion
PersonaPlex-7B brings state-of-the-art conversational AI capabilities to the open source community. Its full-duplex architecture and natural conversation handling make it ideal for voice agents, AI companions, and interactive applications.
This article is part of our Open Source Voice AI Models series.