epub2tts-vibevoice

epub2tts-vibevoice is a free and open source Python app to easily create a full-featured audiobook from an epub or text file using realistic text-to-speech by VibeVoice. CUDA compatible GPU is required, or Apple Silicon.

Features

Creates standard format M4B audiobook file
Automatic chapter break detection
Embeds cover art if specified
Resumes where it left off if interrupted
Uses VibeVoice for high-quality, natural-sounding speech
Supports paragraph-level processing (no sentence splitting needed)
NOTE: epub file must be DRM-free

Usage

Usage instructions

NOTE: If you want to specify where NLTK tokenizer will be stored (about 50mb), use an environment variable: export NLTK_DATA="your/path/to/nltk_data"

OPTIONAL - activate the virtual environment if using

source .venv/bin/activate

FIRST - extract epub contents to text and cover image to png:

epub2tts-vibevoice mybook.epub
edit mybook.txt, replacing # Part 1 etc with desired chapter names, and removing front matter like table of contents and anything else you do not want read. Note: First two lines can be Title: and Author: to use that in audiobook metadata.

Read text to audiobook:

epub2tts-vibevoice mybook.txt --speaker Carter --cover mybook.png
Specify a speaker with --speaker <name>. Available speakers include:
- English: Carter, Davis, Emma, Frank, Grace, Mike, Samuel
- Other languages: Available in DE, FR, IT, JP, KR, NL, PL, PT, ES

All options

-h, --help - show this help message and exit
--speaker <name> - VibeVoice speaker to use (default: Carter)
--model_path <path> - Path to VibeVoice model (default: microsoft/VibeVoice-Realtime-0.5B)
--cover image.[jpg|png] - Image to use for cover
--notitles - Do not read chapter titles when creating audiobook

Deactivate virtual environment

deactivate

Reporting bugs

How to report bugs/issues

Thank you in advance for reporting any bugs/issues you encounter! If you are having issues, first please search existing issues to see if anyone else has run into something similar previously.

If you've found something new, please open an issue and be sure to include:

The full command you executed
The platform (Linux, Windows, OSX)
Your Python version if not using Docker

Release notes

20251214: Initial release with VibeVoice support

Install

Required Python version is 3.11.

NOTE: If you want to specify where NLTK tokenizer will be stored (about 50mb), use an environment variable: export NLTK_DATA="your/path/to/nltk_data"

MAC INSTALLATION

This installation requires Python 3.11 and Homebrew.

# Install dependencies
brew install mecab espeak pyenv ffmpeg

# Install epub2tts-vibevoice
git clone https://github.com/aedocw/epub2tts-vibevoice
cd epub2tts-vibevoice
pyenv install 3.11
pyenv local 3.11

# Create and activate virtual environment
python -m venv .venv && source .venv/bin/activate

# Install with uv (recommended)
pip install uv
uv pip install wheel
uv pip install flash-attn --no-build-isolation
uv pip install .

# Or install with pip
# pip install wheel
# pip install flash-attn --no-build-isolation
# pip install .

LINUX INSTALLATION

These instructions are for Ubuntu 22.04+ (20.04 showed some dependency issues), but should work (with appropriate package installer mods) for just about any distro. Ensure you have ffmpeg installed before use.

# Install dependencies
sudo apt install espeak-ng ffmpeg python3-venv

# Clone the repo
git clone https://github.com/aedocw/epub2tts-vibevoice
cd epub2tts-vibevoice

# Create and activate virtual environment
python3 -m venv .venv && source .venv/bin/activate

# Install with uv (recommended)
pip install uv
uv pip install wheel
uv pip install flash-attn --no-build-isolation
uv pip install .

# Or install with pip
# pip install wheel
# pip install flash-attn --no-build-isolation
# pip install .

WINDOWS INSTALLATION

Running epub2tts-vibevoice in WSL2 with Ubuntu 22 is the easiest approach.

Follow the Linux installation instructions in WSL2.

Updating

UPDATING YOUR INSTALLATION

cd to repo directory
git pull
Activate virtual environment you installed epub2tts-vibevoice in if you installed in a virtual environment using "source .venv/bin/activate"
uv pip install . --upgrade (or pip install . --upgrade if not using uv)

Requirements

Python 3.11
CUDA-compatible GPU (NVIDIA) or Apple Silicon for best performance
CPU-only mode is supported but will be significantly slower
ffmpeg (for M4B creation)

VibeVoice Model

This application uses Microsoft VibeVoice for text-to-speech synthesis.

What's installed automatically:

VibeVoice package (installed from GitHub during setup to include voice files)
Pre-extracted voice files for 25+ speakers in multiple languages
The VibeVoice-Realtime-0.5B model (downloaded from HuggingFace on first use)

Note: The PyPI vibevoice package doesn't include voice files, so we install directly from GitHub.

Voice prompts are provided in pre-extracted embedded format (.pt files) included with VibeVoice. For custom voice creation, please refer to the VibeVoice documentation.

Author

Christopher Aedo

Website: aedo.dev
GitHub: @aedocw

Contributing

Contributions, issues and feature requests are welcome!

Show your support

Give a star if this project helped you!

License

This project uses VibeVoice which is released under its own license terms. Please review the VibeVoice license for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
epub2tts_vibevoice		epub2tts_vibevoice
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

epub2tts-vibevoice

Features

Usage

OPTIONAL - activate the virtual environment if using

FIRST - extract epub contents to text and cover image to png:

Read text to audiobook:

All options

Deactivate virtual environment

Reporting bugs

Release notes

Install

Updating

Requirements

VibeVoice Model

Author

Contributing

Show your support

License

About

Uh oh!

Releases

Packages

Languages

aedocw/epub2tts-vibevoice

Folders and files

Latest commit

History

Repository files navigation

epub2tts-vibevoice

Features

Usage

OPTIONAL - activate the virtual environment if using

FIRST - extract epub contents to text and cover image to png:

Read text to audiobook:

All options

Deactivate virtual environment

Reporting bugs

Release notes

Install

Updating

Requirements

VibeVoice Model

Author

Contributing

Show your support

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages