I find all the ones like espeak, piper, festival to be awful. The voices are OK-ish, but intonation and pronunciation are so very bad. Tortoise is OK, but slow and not for long texts. Paid services like Google, AWS or Elevenlabs are miles ahead. There is a number of CUDA-based engines (provided in the comments of the post I linked) that you supposedly can use if you have a nVidia GPU available. I don’t, so they are not for me.
There are alternatives listed here too: https://discuss.tchncs.de/post/6215470
I find all the ones like espeak, piper, festival to be awful. The voices are OK-ish, but intonation and pronunciation are so very bad. Tortoise is OK, but slow and not for long texts. Paid services like Google, AWS or Elevenlabs are miles ahead. There is a number of CUDA-based engines (provided in the comments of the post I linked) that you supposedly can use if you have a nVidia GPU available. I don’t, so they are not for me.