I don’t think that is possible to translate into a voice in real-time without any delay. Because sometime the speech sentence has to be fully converted into transcript to get a correct translation.
https://www.reddit.com/r/singularity/comments/12foqsp/what_is_the_best_current_real_time_voice/

