Skip to content

[feature request] enhance consistency of TTS #2132

@FNsi

Description

@FNsi

Currently if there's a long text input, qwen3 tts will simply process it all, with empty output or noise after 2:30 minutes; but if we simply split the text to multiple short pieces, the voice consistency will be broken;

I think add function to auto split text inside ctx, and use the last sentence or few words as voice clone, starting the next piece of text, thus the voice consistency be maintained, the computation speed up, then we can easily create long audios.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions