ebook2audiobook: convert ebooks to audiobooks using dynamic AI models and voice cloning
CPU/GPU Converter from eBooks to audiobooks with chapters and metadata using Calibre, ffmpeg, XTTSv2, Fairseq and more. Supports voice cloning and 1124 languages!
New v2.0 Web GUI Interface!
Features
- Converts eBooks to text format with Calibre.
- Splits eBook into chapters for organized audio.
- High-quality text-to-speech with Coqui XTTSv2 and Fairseq.
- Optional voice cloning with your own voice file.
- Supports 1107 languages (English by default). List of Supported languages
- Designed to run on 4GB RAM.
Supported Languages
- Arabic (ara)
- Chinese (zho)
- Czech (ces)
- Dutch (nld)
- English (eng)
- French (fra)
- German (deu)
- Hindi (hin)
- Hungarian (hun)
- Italian (ita)
- Japanese (jpn)
- Korean (kor)
- Polish (pol)
- Portuguese (por)
- Russian (rus)
- Spanish (spa)
- Turkish (tur)
- Vietnamese (vie)
- ** + 1107 languages via Fairseq**
Requirements
- 4gb ram
- Virtualization enabled if running on windows (Docker only)
Installation Instructions
Clone repo
git clone https://github.com/DrewThomasson/ebook2audiobook.git
Specify the language code when running the script in mode.
Launching Gradio Web Interface
Run ebook2audiobook:
Linux/MacOS:
./ebook2audiobook.sh # Run Launch script
Windows:
.\ebook2audiobook.cmd # Run launch script
Open the Web App: Click the URL provided in the terminal to access the web app and convert eBooks.
For Public Link: Add --share to the end of it like this: python app.py --share
- [For More Parameters]: use the --help parameter like this python app.py --help
Basic Usage
Linux/MacOS:
./ebook2audiobook.sh -- --ebook <path_to_ebook_file> --voice [path_to_voice_file] --language [language_code]
Windows:
.\ebook2audiobook.cmd -- --ebook <path_to_ebook_file> --voice [path_to_voice_file] --language [language_code]
- <path_to_ebook_file>: Path to your eBook file.
- [path_to_voice_file]: Optional for voice cloning.
- [language_code]: Optional to specify ISO-639-3 3+ letters language code (default is eng). ISO-639-1 2 letters code is also supported
- [For More Parameters]: use the --help parameter like this python app.py --help
Custom XTTS Model Usage
Linux/MacOS:
./ebook2audiobook.sh -- --ebook <ebook_file_path> --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path> --custom_config <custom_config_path> --custom_vocab <custom_vocab_path>
Windows:
.\ebook2audiobook.cmd -- --ebook <ebook_file_path> --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path> --custom_config <custom_config_path> --custom_vocab <custom_vocab_path>
- <ebook_file_path>: Path to your eBook file.
- <target_voice_file_path>: Optional for voice cloning.
- : Optional to specify language.
- <custom_model_path>: Path to model.pth.
- <custom_config_path>: Path to config.json.
- <custom_vocab_path>: Path to vocab.json.
- [For More Parameters]: use the --help parameter like this python app.py --help
For Detailed Guide with list of all Parameters to use
Linux/MacOS:
./ebook2audiobook.sh --help
Windows:
.\ebook2audiobook.cmd --help
Using Docker
You can also use Docker to run the eBook to Audiobook converter. This method ensures consistency across different environments and simplifies setup.
Running the Docker Container
To run the Docker container and start the Gradio interface, use the following command:
Run with CPU only
docker run -it --rm -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobook python app.py
Run with GPU Speedup (Nvida graphics cards only)
docker run -it --rm --gpus all -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobook python app.py
Building the Docker Container
You can build the docker image with the command: '''powershell docker build --platform linux/amd64 -t athomasson2/ebook2audiobook . '''
This command will start the Gradio interface on port 7860.(localhost:7860)
- For more options like running the docker in mode or making the gradio link public add the --help parameter after the app.py in the docker launch command
Docker container file locations
All ebook2audiobooks will have the base dir of /home/user/app/ For example: tmp = /home/user/app/tmp audiobooks = /home/user/app/audiobooks
Docker headless guide
first for a docker pull of the latest with
docker pull athomasson2/ebook2audiobook
Before you do run this you need to create a dir named "input-folder" in your current dir which will be linked, This is where you can put your input files for the docker image to see
mkdir input-folder && mkdir Audiobooks
In the command below swap out YOUR_INPUT_FILE.TXT with the name of your input file
docker run -it --rm \
-v $(pwd)/input-folder:/home/user/app/input_folder \
-v $(pwd)/audiobooks:/home/user/app/audiobooks \
--platform linux/amd64 \
athomasson2/ebook2audiobook \
python app.py --headless --ebook /input_folder/YOUR_INPUT_FILE.TXT
And that should be it!
The output Audiobooks will be found in the Audiobook folder which will also be located in your local dir you ran this docker command in
To get the help command for the other parameters this program has you can run this
docker run -it --rm \
--platform linux/amd64 \
athomasson2/ebook2audiobook \
python app.py --help
and that will output this
Docker Compose
This project uses Docker Compose to run locally. You can enable or disable GPU support by setting either *gpu-enabled or *gpu-disabled in docker-compose.yml
Steps to Run
Clone the Repository (if you haven't already):
git clone https://github.com/DrewThomasson/ebook2audiobook.git
cd ebook2audiobook
Set GPU Support (disabled by default) To enable GPU support, modify docker-compose.yml and change *gpu-disabled to *gpu-enabled
Start the service:
docker-compose up -d
Access the service: The service will be available at http://localhost:7860.
Renting a GPU
Don't have the hardware to run it or you want to rent a GPU?
You can duplicate the hugginface space and rent a gpu for around $0.40 an hour
Or you can try using the google colab for free!
(Be aware it will time out after a bit of your not messing with the google colab) Free Google Colab
Fine Tuned TTS models
You can fine-tune your own xtts model easily with this repo xtts-finetune-webui
If you want to rent a GPU easily you can also duplicate this huggingface xtts-finetune-webui-space
A space you can use to de-noise the training data easily also denoise-huggingface-space
Fine Tuned TTS Collection
To find our collection of already fine-tuned TTS models, visit this Hugging Face link For an XTTS custom model a ref audio clip of the voice will also be needed:
Supported eBook Formats
.epub, .pdf, .mobi, .txt, .html, .rtf, .chm, .lit, .pdb, .fb2, .odt, .cbr, .cbz, .prc, .lrf, .pml, .snb, .cbc, .rb, .tcr
Best results: .epub or .mobi for automatic chapter detection
Output
Creates an .m4b file with metadata and chapters.