ebook2audiobook: convert ebooks to audiobooks using dynamic AI models and voice cloning

Jan 2, 2025

https://github.com/DrewThomasson/ebook2audiobook

CPU/GPU Converter from eBooks to audiobooks with chapters and metadata using Calibre, ffmpeg, XTTSv2, Fairseq and more. Supports voice cloning and 1124 languages!

New v2.0 Web GUI Interface!

Features

Converts eBooks to text format with Calibre.
Splits eBook into chapters for organized audio.
High-quality text-to-speech with Coqui XTTSv2 and Fairseq.
Optional voice cloning with your own voice file.
Supports 1107 languages (English by default). List of Supported languages
Designed to run on 4GB RAM.

Supported Languages

Arabic (ara)
Chinese (zho)
Czech (ces)
Dutch (nld)
English (eng)
French (fra)
German (deu)
Hindi (hin)
Hungarian (hun)
Italian (ita)
Japanese (jpn)
Korean (kor)
Polish (pol)
Portuguese (por)
Russian (rus)
Spanish (spa)
Turkish (tur)
Vietnamese (vie)
** + 1107 languages via Fairseq**

Requirements

4gb ram
Virtualization enabled if running on windows (Docker only)

Installation Instructions

Clone repo

git clone https://github.com/DrewThomasson/ebook2audiobook.git

Specify the language code when running the script in mode.

Launching Gradio Web Interface

Run ebook2audiobook:

Linux/MacOS:

./ebook2audiobook.sh  # Run Launch script

Windows:

.\ebook2audiobook.cmd  # Run launch script

Open the Web App: Click the URL provided in the terminal to access the web app and convert eBooks.

For Public Link: Add --share to the end of it like this: python app.py --share

[For More Parameters]: use the --help parameter like this python app.py --help

Basic Usage

Linux/MacOS:

./ebook2audiobook.sh  -- --ebook <path_to_ebook_file> --voice [path_to_voice_file] --language [language_code]

Windows:

.\ebook2audiobook.cmd  -- --ebook <path_to_ebook_file> --voice [path_to_voice_file] --language [language_code]

<path_to_ebook_file>: Path to your eBook file.
[path_to_voice_file]: Optional for voice cloning.
[language_code]: Optional to specify ISO-639-3 3+ letters language code (default is eng). ISO-639-1 2 letters code is also supported
[For More Parameters]: use the --help parameter like this python app.py --help

Custom XTTS Model Usage

Linux/MacOS:

./ebook2audiobook.sh  -- --ebook <ebook_file_path> --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path> --custom_config <custom_config_path> --custom_vocab <custom_vocab_path>

Windows:

.\ebook2audiobook.cmd  -- --ebook <ebook_file_path> --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path> --custom_config <custom_config_path> --custom_vocab <custom_vocab_path>

<ebook_file_path>: Path to your eBook file.
<target_voice_file_path>: Optional for voice cloning.
: Optional to specify language.
<custom_model_path>: Path to model.pth.
<custom_config_path>: Path to config.json.
<custom_vocab_path>: Path to vocab.json.
[For More Parameters]: use the --help parameter like this python app.py --help

For Detailed Guide with list of all Parameters to use

Linux/MacOS:

./ebook2audiobook.sh  --help

Windows:

.\ebook2audiobook.cmd  --help

Using Docker

You can also use Docker to run the eBook to Audiobook converter. This method ensures consistency across different environments and simplifies setup.

Running the Docker Container

To run the Docker container and start the Gradio interface, use the following command:

Run with CPU only

docker run -it --rm -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobook python app.py

Run with GPU Speedup (Nvida graphics cards only)

docker run -it --rm --gpus all -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobook python app.py

Building the Docker Container

You can build the docker image with the command: '''powershell docker build --platform linux/amd64 -t athomasson2/ebook2audiobook . '''

This command will start the Gradio interface on port 7860.(localhost:7860)

For more options like running the docker in mode or making the gradio link public add the --help parameter after the app.py in the docker launch command

Docker container file locations

All ebook2audiobooks will have the base dir of /home/user/app/ For example: tmp = /home/user/app/tmp audiobooks = /home/user/app/audiobooks

Docker headless guide

first for a docker pull of the latest with

docker pull athomasson2/ebook2audiobook

Before you do run this you need to create a dir named "input-folder" in your current dir which will be linked, This is where you can put your input files for the docker image to see

mkdir input-folder && mkdir Audiobooks

In the command below swap out YOUR_INPUT_FILE.TXT with the name of your input file

docker run -it --rm \
    -v $(pwd)/input-folder:/home/user/app/input_folder \
    -v $(pwd)/audiobooks:/home/user/app/audiobooks \
    --platform linux/amd64 \
    athomasson2/ebook2audiobook \
    python app.py --headless --ebook /input_folder/YOUR_INPUT_FILE.TXT

And that should be it!

The output Audiobooks will be found in the Audiobook folder which will also be located in your local dir you ran this docker command in

To get the help command for the other parameters this program has you can run this

docker run -it --rm \
    --platform linux/amd64 \
    athomasson2/ebook2audiobook \
    python app.py --help

and that will output this

Help command output

Docker Compose

This project uses Docker Compose to run locally. You can enable or disable GPU support by setting either *gpu-enabled or *gpu-disabled in docker-compose.yml

Steps to Run

Clone the Repository (if you haven't already):

git clone https://github.com/DrewThomasson/ebook2audiobook.git
cd ebook2audiobook

Set GPU Support (disabled by default) To enable GPU support, modify docker-compose.yml and change *gpu-disabled to *gpu-enabled

Start the service:

docker-compose up -d

Access the service: The service will be available at http://localhost:7860.

Renting a GPU

Don't have the hardware to run it or you want to rent a GPU?

You can duplicate the hugginface space and rent a gpu for around $0.40 an hour

Huggingface Space Demo

Or you can try using the google colab for free!

(Be aware it will time out after a bit of your not messing with the google colab) Free Google Colab

Fine Tuned TTS models

You can fine-tune your own xtts model easily with this repo xtts-finetune-webui

If you want to rent a GPU easily you can also duplicate this huggingface xtts-finetune-webui-space

A space you can use to de-noise the training data easily also denoise-huggingface-space

Fine Tuned TTS Collection

To find our collection of already fine-tuned TTS models, visit this Hugging Face link For an XTTS custom model a ref audio clip of the voice will also be needed:

Supported eBook Formats

.epub, .pdf, .mobi, .txt, .html, .rtf, .chm, .lit, .pdb, .fb2, .odt, .cbr, .cbz, .prc, .lrf, .pml, .snb, .cbc, .rb, .tcr

Best results: .epub or .mobi for automatic chapter detection

Output

Creates an .m4b file with metadata and chapters.

Example Output: