Skip to content

eel-brah/kokorodoki

Repository files navigation

KokoroDoki: Real-Time Text-to-Speech (TTS)

KokoroDoki is a real-time Text-to-Speech application supporting multiple languages and voices. It runs locally on your laptop, utilizing either the CPU or leverage CUDA for GPU acceleration.

Powered by Kokoro-82M an open-weight TTS model that delivers high-quality, natural-sounding speech, it rivals larger models while remaining lightweight and highly responsive.

Whether you need to listen to articles, generating audio files, or just want text spoken back in real time, KokoroDoki has you coveredβ€”featuring Console, GUI, CLI, and Daemon modes to suit any workflow.

🎧 Voice Demo - af_heart

Voice.Demo.mp4

πŸ“ Table of Contents


Features

  • 🌍 Multilingual Support Supports English (US/UK), Spanish, French, Hindi, Italian, Brazilian Portuguese, Japanese, Mandarin Chinese.

  • πŸ—£οΈ Diverse Voice Selection Switch between a range of expressive voices.

  • πŸŽ›οΈ Interactive Playback Control Pause, resume, skip, rewind, or change voice/language on the fly.

  • πŸ—Œ Clipboard Monitoring (Daemon Mode) Automatically speak out any copied text.

  • πŸ“‚ Export to WAV Save generated audio for offline use or editing.

  • 🎬 SRT Subtitle Support Generate timed audio from SRT subtitle files with perfect synchronization.

  • πŸ“Ÿ Console Mode An interactive terminal mode with commands to control playback, switch voices/languages, and more.

  • πŸ–₯️ GUI Mode A GUI with support for multiple themes and sentence highlighting.

  • πŸ› οΈ Daemon Runs as a background service. Accepts commands/text from client.py.

  • </> CLI Modes One-shot text-to-speech conversion from input text or file.

  • ⚑ Fast & Real-time Designed for low-latency performance with quick response and smooth voice output.

  • 🧠 CPU & GPU Supports CPU and CUDA-based acceleration.

⚠️ Note: While CPU mode is supported, performance is significantly better with CUDA-enabled GPUs. Without CUDA, real-time responsiveness and speed may be reduced.

Floating Tool Bar

πŸ’‘ Check out floatingtoolbar by markd89 β€” a Python/Qt GUI alternative to hotkeys for Daemon Mode.

🌐 Language & Voice Availability

Language Voices
American English 20
British English 8
Spanish 3
French 1
Hindi 4
Italian 2
Brazilian Portuguese 3
Japanese 5
Mandarin Chinese 8

πŸš€ Setup

Requirements

Make sure you have the following installed (see below for a guide):

  • Python: 3.12
  • CUDA Toolkit: cuda-toolkit
  • Audio & Voice Tools: portaudio, espeak
  • Build Tools: python3.12-dev, python3.12-virtualenv
  • Python interface to the Tcl/Tk GUI toolkit: python3.12-tkinter
  • Clipboard Support (for Daemon Mode): xclip or wl-clipboard for Wayland

Setup Instructions

0. Windows or Docker

For Windows instructions, see windows.md. For Docker, see docker.md.

1. Dependencies

For Debian-based/Ubuntu systems:

sudo apt update && sudo apt install -y \
  python3.12 python3.12-dev python3.12-venv python3.12-tk \
  portaudio19-dev espeak wl-clipboard

For Fedora/RHEL/CentOS systems:

sudo dnf install -y \
  python3.12 python3.12-devel python3-virtualenv python3.12-tkinter \
  portaudio espeak wl-clipboard

2. Install CUDA Toolkit

Follow the official guide to install the CUDA Toolkit compatible with your GPU and OS.

3. Installation

# Clone the repository
cd ~/.local/bin
git clone https://github.com/eel-brah/kokorodoki kdoki

# Set up a virtual environment
mkdir -p ~/.venvs
cd ~/.venvs
python3.12 -m venv kdvenv
source kdvenv/bin/activate

# Install Python requirements
pip install -r ~/.local/bin/kdoki/requirements.txt

# Exit the virtual environment
deactivate

# Copy the wrapper
cp ~/.local/bin/kdoki/kokorodoki ~/.local/bin
chmod +x ~/.local/bin/kokorodoki

# Copy the wrapper for the client (Daemon mode)
cp ~/.local/bin/kdoki/doki ~/.local/bin
chmod +x ~/.local/bin/doki

4. Integrate with systemd

# Integrate with systemd for better control (Daemon mode)
chmod +x generate_kokorodoki_service.sh
./generate_kokorodoki_service.sh

systemctl --user daemon-reload
systemctl --user enable kokorodoki.service
systemctl --user start kokorodoki.service
systemctl --user status kokorodoki.service

# To test it:
# Copy some text like "Integrate with systemd for better control"
# Then run:
doki

# Create custom shortcuts for more convenience experience

# Send from clipboard
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py

# To trigger Ctrl+C before sending clipboard (No need to copy text manually)
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py -c

# Stop playback 
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --stop

# Pause playback
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --pause

# Resume playback
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --resume

# Skip to next sentence
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --next

# Go back a sentence
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --back

5. Optional: Language Support Packages

# For Japanese
pip install pyopenjtalk

# For Mandarin Chinese
pip install ordered_set pypinyin jieba cn2an

Usage

Examples

# Launch with default mode (Console mode) and default settings
kokorodoki

# The first run may take some time as it downloads the AI model.

# OR, if you didn't follow the setup steps above:
# Use the path to your virtual environment and the path to where you cloned
/path/to/venv/bin/python3.12 /path/to/kokorodoki/src/main.py

# Show help
kokorodoki -h

# List available voices
kokorodoki --list-voices

# Choose your settings – Spanish, voice ef_dora, with 1.2x speed
kokorodoki -l e -v ef_dora -s 1.2

# Launch GUI mode
kokorodoki --gui

# Launch daemon mode
kokorodoki --daemon

# Copy some text to clipboard, then run:
doki

# Change voice
doki -v af_sky

# Copy text again and play with:
doki

# Generate and save audio to a file
kokorodoki -l b -v bf_lily -t "Generate audio file" -o output.wav

# Generate timed audio from SRT subtitle file (auto-detected)
kokorodoki -f subtitles.srt -o timed_audio.wav

Command-Line Arguments

Info

  • --list-languages Show all supported languages.
  • --list-voices [LANG] Show all available voices. Optionally filter by language.
  • --themes Show available GUI themes.

Configuration

  • --language, -l Set the initial language (e.g., a for American English).
  • --voice, -v Set the initial voice (e.g., af_heart).
  • --speed, -s Set the initial speed (range: 0.5 to 2.0).

Modes

  • --gui, -g Run in GUI mode.
  • --daemon Run in daemon mode.
  • --text, -t Supply a text string for CLI mode.
  • --file, -f Supply a text file or SRT subtitle file (SRT files detected automatically).
  • --output, -o Specify output .wav file path.
  • --all Test all voices for the selected language (only with --text or text files).

Other Options

  • --device Set the computation device (cuda or cpu).
  • --port Set the port for daemon mode (default: 5561).
  • --theme Set GUI theme (default: darkly).
  • --history-off Disable saving command history.
  • --verbose, -V Enable verbose output.
  • --ctrl_c_off, -c Disable Ctrl+C from stopping playback.

1/4. πŸ–₯️ Console Mode (Interactive Terminal)

Run an interactive terminal interface for real-time TTS, featuring playback control and input history.

kokorodoki

> Hello world!       # Read this line
> !help              # Show help
> !lang b            # Switch to British English
> !voice bf_emma     # Use a specific voice
> !speed 1.5         # Adjust speed
> !pause             # Pause playback
> !resume            # Resume
> !quit              # Exit

Available Commands

🎡 Playback Control

Command Description Example
!stop, !s Stop current playback !stop
!pause, !p Pause playback !pause
!resume, !r Resume playback !resume
!next, !n Skip to next sentence !next
!back, !b Go to previous sentence !back

🎚️ Audio Settings

Command Description Example
!lang <code> Change language !lang b
!voice <name> Change voice !voice af_bella
!speed <value> Set playback speed (0.5–2.0) !speed 1.5

🧠 Information

Command Description Example
!list_langs List available languages !list_langs
!list_voices List voices for the current language !list_voices
!list_all_voices List voices for all languages !list_all_voices
!status Show current settings !status

πŸ›  Interface

Command Description Example
!clear Clear the screen !clear
!clear_history Clear command history !clear_history
!verbose Toggle verbose mode !verbose
!ctrlc Change behavior of Ctrl+C !ctrlc
!help, !h Show this help message !help
!quit, !q Exit the program !quit

2/4. πŸ–ŒοΈ GUI Mode

Launch a graphical interface.

kokorodoki --gui

Themes:

  1. darkly
  2. cyborg
  3. solar
  4. vapor

Use --theme <number> to set it.

3/4. πŸ› οΈ Daemon Mode (Background Service)

Runs in the background and receives text or commands via client.py.

kokorodoki --daemon 

To send a text, copy the desired text and run client.py or doki

~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py

or

doki

Pause and resume playback

doki --pause
doki --resume

🧾 Client Arguments & Daemon Commands

# Playback Controls 
--stop                  Stop reading
--pause                 Pause reading
--resume                Resume reading
--next                  Skip a sentence
--back                  Go back one sentence
--status                Get current settings
--exit                  Stop reading

# Change settings
--language, -l          Change language ('a' for American English)
--voice, -v             Change voice
--speed, -s             Change speed (range: 0.5-2.0)

# Info
--list-languages        List available languages
--list-voices [LANG]    List available voices, optionally filtered by language

# Config
--port                  If you run kokorodoki daemon mode in a different port, use this option to specify it
--ctrl-c                To trigger Ctrl+C before sending clipboard

Recommended: Use with systemd for Better Control

See the 'Integrate with systemd' section in the installation guide above for steps.

Recommended: Create Custom Shortcuts

To improve your experience, it's recommended to create custom keyboard shortcuts for frequently used commands.

Below are some useful commands you can bind to shortcuts:

# Send from clipboard
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py

# Stop playback 
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --stop

# Pause playback
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --pause

# Resume playback
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --resume

# Skip to next sentence
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --next

# Go back a sentence
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --back

You can assign these to keyboard shortcuts using your desktop environment’s keyboard settings (e.g., GNOME β†’ Settings β†’ Keyboard β†’ Custom Shortcuts).

4/4. ⚑ CLI Mode (One-Shot)

Use the command-line interface for quick, one-off tasks:

# Speak text
python src/main.py --text "Hello, world!"              
# Speak text from a file and save the audio to a file
python src/main.py --file input.txt --output out.wav   
# Generate timed audio from SRT subtitle file (auto-detected)
python src/main.py --file subtitles.srt --output timed_audio.wav
# Speak using all available voices of a specific language
python src/main.py -l a --text "All voices" --all      

🎬 SRT Subtitle Support

KokoroDoki now supports SRT subtitle files for generating timed audio. This feature allows you to:

  • Perfect Timing: Generate audio that matches subtitle timestamps exactly
  • Professional Content: Create synchronized voiceovers for videos
  • Accessibility: Convert subtitled content into audio format

SRT File Format Example:

1
00:00:00,000 --> 00:00:03,000
Hello and welcome to our demonstration.

2
00:00:04,000 --> 00:00:07,500
This is how SRT subtitles work with timed audio.

3
00:00:08,500 --> 00:00:12,000
Each subtitle entry will be spoken at the correct time.

Usage:

# Generate timed audio from SRT file (auto-detected)
kokorodoki -f subtitles.srt -o synchronized_audio.wav

# With custom voice and language
kokorodoki -f subtitles.srt -l a -v af_heart -o output.wav

Core Dependencies

  • nltk – Sentence parsing
  • torch, numpy, librosa – Audio processing
  • sounddevice, soundfile – Audio playback
  • kokoro – The TTS model
  • tkinter, ttkbootstrap – GUI theming
  • rich – Fancy CLI output
  • socket - For Daemon and Client communication

🀝 Contribute

  1. Fork the repo
  2. Create a new branch
  3. Make your changes
  4. Open a pull request

Bug reports and feature requests are welcome on GitHub Issues.

πŸ“œ License

This project is licensed under the GNU General Public License v3.0 (GPLv3).

You are free to use, modify, and distribute this software under the terms of the GPLv3.

About

Natural-sounding Text-to-Speech App that fits anywhere. Fast, Real-Time and flexible.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published