PersonaPlex-7B: Full-Duplex Speech-to-Speech Model Guide

PersonaPlex-7B is an open source speech-to-speech model designed for real-time conversational AI. Unlike traditional TTS pipelines, it processes speech end-to-end, enabling natural full-duplex conversations with interruption handling.

Overview

PersonaPlex-7B represents a new approach to voice AI:

Speech-to-speech: Direct audio processing without intermediate text
Full-duplex: Listen and respond simultaneously
Interruption handling: Natural turn-taking and barge-in support
Context awareness: Maintains conversation history and emotional state
7B parameters: Optimized for real-time inference

Architecture

┌─────────────┐     ┌──────────────────┐     ┌─────────────┐
│ User Speech │────►│  PersonaPlex-7B  │────►│ AI Response │
│   (Audio)   │◄────│ Speech-to-Speech │◄────│   (Audio)   │
└─────────────┘     └──────────────────┘     └─────────────┘
        ▲                    │                      │
        │                    ▼                      │
        │           ┌────────────────┐              │
        └───────────│ Full-Duplex    │◄─────────────┘
                    │ Stream Manager │
                    └────────────────┘

Hardware Requirements

Configuration	VRAM	Latency	Concurrent Users
Minimum	16GB	~500ms	1
Recommended	24GB	~300ms	2-4
Production	40GB+	~200ms	8+

Quick Start

Installation

pip install personaplex-7b

Basic Usage

from personaplex import PersonaPlex7B
 
# Initialize the model
model = PersonaPlex7B.from_pretrained("personaplex/personaplex-7b")
 
# Create a conversation session
session = model.create_session()
 
# Process audio input and get response
response_audio = session.process(input_audio)

Streaming Conversation

import asyncio
from personaplex import PersonaPlex7B, AudioStream
 
async def conversation():
    model = PersonaPlex7B.from_pretrained("personaplex/personaplex-7b")
    session = model.create_session()
 
    # Create bidirectional audio streams
    input_stream = AudioStream.from_microphone()
    output_stream = AudioStream.to_speaker()
 
    # Run full-duplex conversation
    async for response_chunk in session.stream(input_stream):
        await output_stream.write(response_chunk)
 
asyncio.run(conversation())

Full-Duplex Features

Interruption Handling

PersonaPlex-7B naturally handles interruptions:

session = model.create_session(
    interruption_threshold=0.3,  # Sensitivity (0-1)
    fade_on_interrupt=True,      # Gracefully fade out
    interrupt_response="adaptive" # or "immediate", "delayed"
)

Turn-Taking

Configure natural conversation flow:

session = model.create_session(
    end_of_turn_detection="auto",  # Automatic pause detection
    min_response_delay=100,        # ms before responding
    backchanneling=True            # Enable "uh-huh", "I see" etc.
)

Voice Configuration

Built-in Voices

# List available voices
voices = model.list_voices()
# ['aria', 'marcus', 'elena', 'kai', ...]
 
# Use a specific voice
session = model.create_session(voice="aria")

Voice Cloning

Clone a voice for personalized agents:

custom_voice = model.clone_voice(
    reference_audio="agent_voice.wav",
    name="my_agent"
)
 
session = model.create_session(voice=custom_voice)

System Prompts

Guide the AI's behavior with system prompts:

session = model.create_session(
    system_prompt="""You are a helpful customer support agent for TechCorp.
    Be concise, friendly, and solution-oriented.
    If you don't know something, offer to connect the user with a human agent."""
)

WebSocket Server

Deploy as a real-time WebSocket server:

from personaplex import PersonaPlex7B, WebSocketServer
 
model = PersonaPlex7B.from_pretrained("personaplex/personaplex-7b")
server = WebSocketServer(model, port=8765)
 
# Start serving
server.run()

Client connection:

const ws = new WebSocket('ws://localhost:8765');
const mediaRecorder = new MediaRecorder(audioStream);
 
mediaRecorder.ondataavailable = (e) => ws.send(e.data);
ws.onmessage = (e) => playAudio(e.data);

Production Deployment

Docker Compose

version: '3.8'
services:
  personaplex:
    image: personaplex/personaplex-7b:latest
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - MODEL_CACHE=/models
    volumes:
      - ./models:/models
    ports:
      - "8765:8765"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Load Balancing

For high-traffic deployments:

services:
  personaplex:
    deploy:
      replicas: 4
    # ... rest of config
 
  nginx:
    image: nginx
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf

Performance Optimization

Quantization

Reduce memory usage with quantization:

model = PersonaPlex7B.from_pretrained(
    "personaplex/personaplex-7b",
    quantization="int8"  # ~50% memory reduction
)

Speculative Decoding

Enable faster inference:

model = PersonaPlex7B.from_pretrained(
    "personaplex/personaplex-7b",
    speculative_decoding=True,
    draft_model="personaplex/personaplex-draft"
)

Comparison with Alternatives

Feature	PersonaPlex-7B	Moshi	GPT-4o Realtime
Open Source	Yes	Yes	No
Full-Duplex	Yes	Yes	Yes
Self-hostable	Yes	Yes	No
Voice Cloning	Yes	Limited	No
Languages	English	EN/FR	50+

For more options, see our Open Source Voice AI Models comparison.

When to Use PersonaPlex-7B

Choose PersonaPlex-7B when you need:

Full-duplex conversational AI
Self-hosted deployment with data privacy
Custom voice cloning
Natural interruption handling

Consider alternatives when you need:

Multilingual support → Qwen3-TTS
Lowest latency → Moshi
Simple TTS → Fish Speech
Managed infrastructure → PersonaPlex API

Conclusion

PersonaPlex-7B brings state-of-the-art conversational AI capabilities to the open source community. Its full-duplex architecture and natural conversation handling make it ideal for voice agents, AI companions, and interactive applications.

This article is part of our Open Source Voice AI Models series.

PersonaPlex-7B: Full-Duplex Speech-to-Speech Model Guide

PersonaPlex-7B: Full-Duplex Speech-to-Speech Model Guide

Overview

Architecture

Hardware Requirements

Quick Start

Installation

Basic Usage

Streaming Conversation

Full-Duplex Features

Interruption Handling

Turn-Taking

Voice Configuration

Built-in Voices

Voice Cloning

System Prompts

WebSocket Server

Production Deployment

Docker Compose

Load Balancing

Performance Optimization

Quantization

Speculative Decoding

Comparison with Alternatives

When to Use PersonaPlex-7B

Conclusion

Related Articles

Best Open Source Voice AI Models in 2025

Moshi Voice Model: Kyutai's Full-Duplex AI Guide

Qwen3-TTS: Complete Guide to Alibaba's Open Source TTS

What is Full-Duplex Voice AI?