Enabling experimental triton onnx runtime in vespa today #35649
Answered
by
kkraune
tronikelis2
asked this question in
Q&A
-
|
Hello, We saw that you are supporting the triton service as an alternative to the embedded onnx runtime. I understand it's not production ready, but nonetheless we would like to try it out sooner and see if it would solve our needs of gpu batching. As there are no docs for it yet, could we get some guidelines on how to enable/configure the triton inference? 🙏 Thanks |
Beta Was this translation helpful? Give feedback.
Answered by
kkraune
Jan 20, 2026
Replies: 1 comment 4 replies
-
|
Hello, Triton integration is not quite ready for production yet so no docs. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Good question! We focus on the Vespa Cloud deployment first, as this lets us easily observe the feature and make improvements and fixes. This is a lot about configuration and orchestration. Then we will look at making this available for Vespa Enterprise but I am sorry I have no estimate for this yet.