Offline Multi‑Model TTS Generation Pipeline
GPU‑accelerated, multiprocessing pipeline for high‑throughput speech synthesis. Designed for reproducibility, offline operation, and robust logging.
Scope: integration of multiple open‑source TTS models for offline, high‑throughput inference; focus on orchestration, batching, logging, and GPU utilization. Models are third‑party.
Pipeline In Action

ElevenLabs API Integration
In addition to offline pipelines, I built a lightweight client around the ElevenLabs API for rapid dataset generation and sound effect prototyping. This tool lets me batch-generate audio samples from text or CSV inputs, track metadata in logs, and export results in standard formats.
- Supports batch input via
.csv
or.tsv
- Logs generation status (success/failure, filenames, durations)
- Configurable voice/model settings with simple CLI options
Audio Samples
Dia model — style-conditioned generation
ElevenLabs API — sound effect (Subway leaving station)
Meta Audiocraft - music generation
Technologies Used
Pipeline Diagram
Supported Models
Each script create a new TTS Model Generator.
Models are third‑party; this project focuses on orchestration & batch generation.
Problem & Motivation
Generating large volumes of high‑quality speech is slow, expensive, and hard to reproduce at scale. This project builds a robust, offline‑first pipeline capable of synthesizing tens of thousands of samples while keeping GPU utilization high and logs auditable.
Where It’s Useful
- Dataset creation & augmentation
- Voice cloning and style‑conditioned synthesis
- Evaluation of deepfake detection on non‑speech sounds
Technical Highlights
Models and tokenizers from local HF cache; deterministic runs; no external API.
Workers load models once; shared queues; back‑pressure; graceful failure + retries.
CUDA device pinning via env; per‑worker memory control; safe KV‑cache handling.
Ref text/audio + target text; per‑sample metadata; idempotent resume.
Per‑sample CSV log, tqdm live progress, aggregate stats and error reports.
Audio‑ID stitching, normalization, optional filters, consistent filenames.
Files & Artifacts
Sample Records
sample_id | text | ref_audio | status | latency_ms |
---|---|---|---|---|
cv_en_39586712 | The quick brown fox… | speaker_12.wav | ok | 712 |
cv_en_40953339 | Hello from the pipeline… | speaker_02.wav | ok | 689 |
cv_en_39586770 | KV cache edge case… | speaker_45.wav | skipped | 0 |
Replace with live data or hydrate from a JSON/CSV API route.
This page showcases system design, scale, and engineering details recruiters care about. Audio clips and files above are placeholders—swap in your real artifacts when ready.