KokoroDoki is a real-time Text-to-Speech application supporting multiple languages and voices. It runs locally on your laptop, utilizing either the CPU or leverage CUDA for GPU acceleration.
Powered by Kokoro-82M an open-weight TTS model that delivers high-quality, natural-sounding speech, it rivals larger models while remaining lightweight and highly responsive.
Whether you need to listen to articles, generating audio files, or just want text spoken back in real time, KokoroDoki has you coveredβfeaturing Console, GUI, CLI, and Daemon modes to suit any workflow.
Voice.Demo.mp4
-
π Multilingual Support Supports English (US/UK), Spanish, French, Hindi, Italian, Brazilian Portuguese, Japanese, Mandarin Chinese.
-
π£οΈ Diverse Voice Selection Switch between a range of expressive voices.
-
ποΈ Interactive Playback Control Pause, resume, skip, rewind, or change voice/language on the fly.
-
π Clipboard Monitoring (Daemon Mode) Automatically speak out any copied text.
-
π Export to WAV Save generated audio for offline use or editing.
-
π¬ SRT Subtitle Support Generate timed audio from SRT subtitle files with perfect synchronization.
-
π Console Mode An interactive terminal mode with commands to control playback, switch voices/languages, and more.
-
π₯οΈ GUI Mode A GUI with support for multiple themes and sentence highlighting.
-
π οΈ Daemon Runs as a background service. Accepts commands/text from client.py.
-
</> CLI Modes One-shot text-to-speech conversion from input text or file.
-
β‘ Fast & Real-time Designed for low-latency performance with quick response and smooth voice output.
-
π§ CPU & GPU Supports CPU and CUDA-based acceleration.
β οΈ Note: While CPU mode is supported, performance is significantly better with CUDA-enabled GPUs. Without CUDA, real-time responsiveness and speed may be reduced.
π‘ Check out floatingtoolbar by markd89 β a Python/Qt GUI alternative to hotkeys for Daemon Mode.
| Language | Voices |
|---|---|
| American English | 20 |
| British English | 8 |
| Spanish | 3 |
| French | 1 |
| Hindi | 4 |
| Italian | 2 |
| Brazilian Portuguese | 3 |
| Japanese | 5 |
| Mandarin Chinese | 8 |
Make sure you have the following installed (see below for a guide):
- Python: 3.12
- CUDA Toolkit: cuda-toolkit
- Audio & Voice Tools:
portaudio,espeak - Build Tools:
python3.12-dev,python3.12-virtualenv - Python interface to the Tcl/Tk GUI toolkit:
python3.12-tkinter - Clipboard Support (for Daemon Mode):
xcliporwl-clipboardfor Wayland
For Windows instructions, see windows.md. For Docker, see docker.md.
For Debian-based/Ubuntu systems:
sudo apt update && sudo apt install -y \
python3.12 python3.12-dev python3.12-venv python3.12-tk \
portaudio19-dev espeak wl-clipboardFor Fedora/RHEL/CentOS systems:
sudo dnf install -y \
python3.12 python3.12-devel python3-virtualenv python3.12-tkinter \
portaudio espeak wl-clipboardFollow the official guide to install the CUDA Toolkit compatible with your GPU and OS.
# Clone the repository
cd ~/.local/bin
git clone https://github.com/eel-brah/kokorodoki kdoki
# Set up a virtual environment
mkdir -p ~/.venvs
cd ~/.venvs
python3.12 -m venv kdvenv
source kdvenv/bin/activate
# Install Python requirements
pip install -r ~/.local/bin/kdoki/requirements.txt
# Exit the virtual environment
deactivate
# Copy the wrapper
cp ~/.local/bin/kdoki/kokorodoki ~/.local/bin
chmod +x ~/.local/bin/kokorodoki
# Copy the wrapper for the client (Daemon mode)
cp ~/.local/bin/kdoki/doki ~/.local/bin
chmod +x ~/.local/bin/doki# Integrate with systemd for better control (Daemon mode)
chmod +x generate_kokorodoki_service.sh
./generate_kokorodoki_service.sh
systemctl --user daemon-reload
systemctl --user enable kokorodoki.service
systemctl --user start kokorodoki.service
systemctl --user status kokorodoki.service
# To test it:
# Copy some text like "Integrate with systemd for better control"
# Then run:
doki
# Create custom shortcuts for more convenience experience
# Send from clipboard
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py
# To trigger Ctrl+C before sending clipboard (No need to copy text manually)
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py -c
# Stop playback
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --stop
# Pause playback
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --pause
# Resume playback
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --resume
# Skip to next sentence
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --next
# Go back a sentence
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --back# For Japanese
pip install pyopenjtalk
# For Mandarin Chinese
pip install ordered_set pypinyin jieba cn2an# Launch with default mode (Console mode) and default settings
kokorodoki
# The first run may take some time as it downloads the AI model.
# OR, if you didn't follow the setup steps above:
# Use the path to your virtual environment and the path to where you cloned
/path/to/venv/bin/python3.12 /path/to/kokorodoki/src/main.py
# Show help
kokorodoki -h
# List available voices
kokorodoki --list-voices
# Choose your settings β Spanish, voice ef_dora, with 1.2x speed
kokorodoki -l e -v ef_dora -s 1.2
# Launch GUI mode
kokorodoki --gui
# Launch daemon mode
kokorodoki --daemon
# Copy some text to clipboard, then run:
doki
# Change voice
doki -v af_sky
# Copy text again and play with:
doki
# Generate and save audio to a file
kokorodoki -l b -v bf_lily -t "Generate audio file" -o output.wav
# Generate timed audio from SRT subtitle file (auto-detected)
kokorodoki -f subtitles.srt -o timed_audio.wav
--list-languagesShow all supported languages.--list-voices [LANG]Show all available voices. Optionally filter by language.--themesShow available GUI themes.
--language,-lSet the initial language (e.g.,afor American English).--voice,-vSet the initial voice (e.g.,af_heart).--speed,-sSet the initial speed (range: 0.5 to 2.0).
--gui,-gRun in GUI mode.--daemonRun in daemon mode.--text,-tSupply a text string for CLI mode.--file,-fSupply a text file or SRT subtitle file (SRT files detected automatically).--output,-oSpecify output.wavfile path.--allTest all voices for the selected language (only with--textor text files).
--deviceSet the computation device (cudaorcpu).--portSet the port for daemon mode (default:5561).--themeSet GUI theme (default:darkly).--history-offDisable saving command history.--verbose,-VEnable verbose output.--ctrl_c_off,-cDisable Ctrl+C from stopping playback.
Run an interactive terminal interface for real-time TTS, featuring playback control and input history.
kokorodoki
> Hello world! # Read this line
> !help # Show help
> !lang b # Switch to British English
> !voice bf_emma # Use a specific voice
> !speed 1.5 # Adjust speed
> !pause # Pause playback
> !resume # Resume
> !quit # Exit
π΅ Playback Control
| Command | Description | Example |
|---|---|---|
!stop, !s |
Stop current playback | !stop |
!pause, !p |
Pause playback | !pause |
!resume, !r |
Resume playback | !resume |
!next, !n |
Skip to next sentence | !next |
!back, !b |
Go to previous sentence | !back |
ποΈ Audio Settings
| Command | Description | Example |
|---|---|---|
!lang <code> |
Change language | !lang b |
!voice <name> |
Change voice | !voice af_bella |
!speed <value> |
Set playback speed (0.5β2.0) | !speed 1.5 |
π§ Information
| Command | Description | Example |
|---|---|---|
!list_langs |
List available languages | !list_langs |
!list_voices |
List voices for the current language | !list_voices |
!list_all_voices |
List voices for all languages | !list_all_voices |
!status |
Show current settings | !status |
π Interface
| Command | Description | Example |
|---|---|---|
!clear |
Clear the screen | !clear |
!clear_history |
Clear command history | !clear_history |
!verbose |
Toggle verbose mode | !verbose |
!ctrlc |
Change behavior of Ctrl+C | !ctrlc |
!help, !h |
Show this help message | !help |
!quit, !q |
Exit the program | !quit |
Launch a graphical interface.
kokorodoki --guiThemes:
- darkly
- cyborg
- solar
- vapor
Use --theme <number> to set it.
Runs in the background and receives text or commands via client.py.
kokorodoki --daemon To send a text, copy the desired text and run client.py or doki
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py
or
dokiPause and resume playback
doki --pause
doki --resume# Playback Controls
--stop Stop reading
--pause Pause reading
--resume Resume reading
--next Skip a sentence
--back Go back one sentence
--status Get current settings
--exit Stop reading
# Change settings
--language, -l Change language ('a' for American English)
--voice, -v Change voice
--speed, -s Change speed (range: 0.5-2.0)
# Info
--list-languages List available languages
--list-voices [LANG] List available voices, optionally filtered by language
# Config
--port If you run kokorodoki daemon mode in a different port, use this option to specify it
--ctrl-c To trigger Ctrl+C before sending clipboardSee the 'Integrate with systemd' section in the installation guide above for steps.
To improve your experience, it's recommended to create custom keyboard shortcuts for frequently used commands.
Below are some useful commands you can bind to shortcuts:
# Send from clipboard
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py
# Stop playback
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --stop
# Pause playback
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --pause
# Resume playback
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --resume
# Skip to next sentence
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --next
# Go back a sentence
~/.venvs/kdvenv/bin/python3.12 ~/.local/bin/kdoki/src/client.py --backYou can assign these to keyboard shortcuts using your desktop environmentβs keyboard settings (e.g., GNOME β Settings β Keyboard β Custom Shortcuts).
Use the command-line interface for quick, one-off tasks:
# Speak text
python src/main.py --text "Hello, world!"
# Speak text from a file and save the audio to a file
python src/main.py --file input.txt --output out.wav
# Generate timed audio from SRT subtitle file (auto-detected)
python src/main.py --file subtitles.srt --output timed_audio.wav
# Speak using all available voices of a specific language
python src/main.py -l a --text "All voices" --all KokoroDoki now supports SRT subtitle files for generating timed audio. This feature allows you to:
- Perfect Timing: Generate audio that matches subtitle timestamps exactly
- Professional Content: Create synchronized voiceovers for videos
- Accessibility: Convert subtitled content into audio format
SRT File Format Example:
1
00:00:00,000 --> 00:00:03,000
Hello and welcome to our demonstration.
2
00:00:04,000 --> 00:00:07,500
This is how SRT subtitles work with timed audio.
3
00:00:08,500 --> 00:00:12,000
Each subtitle entry will be spoken at the correct time.Usage:
# Generate timed audio from SRT file (auto-detected)
kokorodoki -f subtitles.srt -o synchronized_audio.wav
# With custom voice and language
kokorodoki -f subtitles.srt -l a -v af_heart -o output.wavnltkβ Sentence parsingtorch,numpy,librosaβ Audio processingsounddevice,soundfileβ Audio playbackkokoroβ The TTS modeltkinter,ttkbootstrapβ GUI themingrichβ Fancy CLI outputsocket- For Daemon and Client communication
- Fork the repo
- Create a new branch
- Make your changes
- Open a pull request
Bug reports and feature requests are welcome on GitHub Issues.
This project is licensed under the GNU General Public License v3.0 (GPLv3).
You are free to use, modify, and distribute this software under the terms of the GPLv3.