Enabling experimental triton onnx runtime in vespa today #35649

tronikelis2 · 2026-01-20T07:12:51Z

tronikelis2
Jan 20, 2026

Hello,

We saw that you are supporting the triton service as an alternative to the embedded onnx runtime.

I understand it's not production ready, but nonetheless we would like to try it out sooner and see if it would solve our needs of gpu batching.

As there are no docs for it yet, could we get some guidelines on how to enable/configure the triton inference? 🙏

Thanks

Answered by kkraune

Jan 20, 2026

Good question! We focus on the Vespa Cloud deployment first, as this lets us easily observe the feature and make improvements and fixes. This is a lot about configuration and orchestration. Then we will look at making this available for Vespa Enterprise but I am sorry I have no estimate for this yet.

View full answer

glebashnik · 2026-01-20T10:37:58Z

glebashnik
Jan 20, 2026
Collaborator

Hello, Triton integration is not quite ready for production yet so no docs.

4 replies

tronikelis2 Jan 20, 2026
Author

Yeah, I understand, but is it truly not possible to enable it yet, just to test the difference between the embedded onnx runtime?

glebashnik Jan 20, 2026
Collaborator

Currently it can only be enabled in Vespa Cloud with a feature flag.
We did some benchmarking to compare it with embedded ONNX runtime.
Even without batching it improves performance on GPU, more so with batching.
We will publish the results when it is production ready.

tronikelis2 Jan 20, 2026
Author

Do you perhaps have an estimate when it will be available for self hosted users?

kkraune Jan 20, 2026
Maintainer

Good question! We focus on the Vespa Cloud deployment first, as this lets us easily observe the feature and make improvements and fixes. This is a lot about configuration and orchestration. Then we will look at making this available for Vespa Enterprise but I am sorry I have no estimate for this yet.

Answer selected by abhishekkrthakur

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling experimental triton onnx runtime in vespa today #35649

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Enabling experimental triton onnx runtime in vespa today #35649

Uh oh!

tronikelis2 Jan 20, 2026

Replies: 1 comment · 4 replies

Uh oh!

glebashnik Jan 20, 2026 Collaborator

Uh oh!

tronikelis2 Jan 20, 2026 Author

Uh oh!

glebashnik Jan 20, 2026 Collaborator

Uh oh!

tronikelis2 Jan 20, 2026 Author

Uh oh!

kkraune Jan 20, 2026 Maintainer

tronikelis2
Jan 20, 2026

Replies: 1 comment 4 replies

glebashnik
Jan 20, 2026
Collaborator

tronikelis2 Jan 20, 2026
Author

glebashnik Jan 20, 2026
Collaborator

tronikelis2 Jan 20, 2026
Author

kkraune Jan 20, 2026
Maintainer