Skip to content

Investigate support for streaming mode #6

@HennerM

Description

@HennerM

Previously reported in #2 (comment) by @aamir-s18

Streaming mode could be useful for very big models. It can help in real-time use cases where we can improve User Experience by generating one token at a time.

Triton Server supports streaming with decoupled models

It needs to be investigated how CTranslate can be used to get decoded tokens one-by-one. Additionally this might be trickier in a beam decode setting, unless we are willing to always return the best guess which could flip previous words

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions