Skip to content

feat!: add toolset versioning#639

Open
Cali0707 wants to merge 5 commits intocontainers:mainfrom
Cali0707:toolset-versioning
Open

feat!: add toolset versioning#639
Cali0707 wants to merge 5 commits intocontainers:mainfrom
Cali0707:toolset-versioning

Conversation

@Cali0707
Copy link
Collaborator

This PR adds toolset versioning and versioning guidelines to the server.

A couple key points for reviewers:

  1. Based on the current guidelines, no toolsets are "stable". The best is "core" which is "beta". As such, the default version has been set to "beta"
  2. The helm toolset has no evals, so it is considered "alpha". As such, it is not included by default in tool lists (this is a breaking change)

Signed-off-by: Calum Murray <cmurray@redhat.com>
This is a breaking change as the helm toolset does not meet the
requirements to be a "beta" toolset, so it was reverted to "alpha".
This means that helm is now disabled by default.

Signed-off-by: Calum Murray <cmurray@redhat.com>
Signed-off-by: Calum Murray <cmurray@redhat.com>
Signed-off-by: Calum Murray <cmurray@redhat.com>
The general idea for these versions is:
- "alpha": the toolset is not guaranteed to work well
- "beta": the toolset is not guaranteed to work well, but we are evaluating how well it works
- "stable": the toolset works well, and we are evaluating how well it works to avoid regressions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stable: this reads (b/c of evaluating) not that stable. that sentence reads a bit vague.

### Beta

For a tool/prompt/toolset to enter into "beta", we require that there are eval scenarios. For a toolset to enter "beta", there must be scenarios
excercising all of the tools and prompts in the toolset. For individual tools and prompts to enter "beta", we only require an eval scenario
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps would be good to eventually point to concrete example for that. But generally I like the requirement of having some sort of eval (w/ our toolkit)


### GA/Stable

For a tool/prompt/toolset to enter into "stable", we require that 95% or more of the eval scenarios are passing. There is the same requirements as "beta" in terms of the number of evaluation scenarios.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

95% comes from?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @mrunalp was mentioning this as a threshold

```toml
default_toolset_version = "beta"

toolsets = [ "core", "config", "helm:alpha" ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this schema

ServerInstructions string `toml:"server_instructions,omitempty"`

// Which toolset version to enable (any tools/toolsets below this will be disabled)
DefaultToolsetVersion api.Version `toml:"default_toolset_version"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, at "server level", we say stable - hence no beta (for instance) enabled.

what about explicit enablement - given a "global default" ?

E.g. something like toolsets = [ "core", "config", "helm:alpha" ] would than "win", right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO there are two parts to configuring everything:

  1. Which toolsets you want. In my mind, this doesn't necessarily align with how mature the toolsets are, but rather with which domains you want to interact
  2. What level of maturity of tools you want to use

So, if you set "core", "config", and "helm:alpha" in the current setup, what would happen is:

  1. The core and config toolsets would both pick up the default version of "stable". As they are both in "beta" currently, no tools would be selected
  2. The helm toolset would use the overridden "alpha" version, and since it is in alpha, all of it's tools would be available

I wasn't 100% convinced that the way I wrote it is the most intuititive, I just want to capture somewhere that there are those two key separate ideas in the config (which toolsets/domains, which versions you are okay with). My main thought is that enabling "stable" or "beta" should not enable all the toolsets with that version

GetPrompts() []ServerPrompt
// GetVersion returns the version of the toolset.
// This version can be overridden by specific tools/prompts (e.g. a toolset may be beta, but have an alpha tool).
GetVersion() Version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for downstreaming impls we would than just set those?

"name": "kiali_workload_logs"
}
]
[]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think this is a mistake - because now by default this toolset is not enabled, there are no tools so the json snapshot update removed these.

Will switch it so these are enabled in the tests

@manusa
Copy link
Member

manusa commented Jan 13, 2026

Given what we discussed internally yesterday and this proposal, I'm not that sure that versioning is exactly what we want.
Also, I don't think this fits exactly the upstream purposes.

This is how I see it:

  • I don't think that 3 levels (alpha, beta, stable) are really necessary but I might be missing something (e.g. I don't see a need to differentiate between beta and alpha). (See next point)
  • Instead of providing versioning, I believe that for the downstream story, it should be something more like certified or supported
    This should cover the productization story where the MCP server is provided with a given set of tools that are supported for customers.
    In case certified misses something, then I guess version is fine.
  • Upstream (this repo), SHOULD provide the required infrastructure to declare a toolset as certified/versioned.
    However, I think it SHOULD NOT take on the responsibility of declaring a toolset certified (or GA), since it doesn't really matter for the upstream world and its licensing ("AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND).
  • For maintenance reasons and also to facilitate the creation of the certified infrastructure upstream and being able to leverage it downstream, I believe that a slice of certified toolsets should be declared somewhere (maybe the toolsets package) upstream with the _overrides.go pattern to be able to redefine it downstream.
    This is considering that entire toolsets should be certified as opposed to the current approach which provide granularity for individual tools or prompts (I don't really see the case for that, but I might be missing some points here too).
  • I like the fact that certified toolsets (or a given version threshold in case we go that route) can be enforced while starting the server.
    But this should really account for their intersections and edge cases (enabled toolsets, enabled/disabled tools, readonly, etc.) Maybe warnings or exceptions should be printed with informative messages (e.g. a server is started with settings that effectively provide no toolsets)

Signed-off-by: Calum Murray <cmurray@redhat.com>
@matzew
Copy link
Collaborator

matzew commented Jan 13, 2026

for the downstream story, it should be something more like certified or supported

I think for that (downstream) certified/supported versus NOT certified is good.

For upstream (here) I do see the reasoning behind the idea of coming in at a more flexible stage (e.g. alpha, beta, ga), allowing to define a set of robustness there. E.g. in some CNCF projects we had similar stage of "feature promotion". Showing how robust a (new) feature is. Which still than would fit with the term on the license.

@manusa I guess you are more for a lower (or different) bar of adding toolsets, like "if it passes X % of eval it can come in" ? Should all be enabled by default? (Which of course can be changed on the downstream too)

@Cali0707
Copy link
Collaborator Author

From discussions today the idea (as I understand it) is:

  1. Toolsets will either be "certified" or not (possibly under some other name, like "passing evals")
  2. We will have some slice of the certified toolsets
  3. There will be an eval requirement to make a toolset certified - if the toolset falls below the bar, we remove it from certified
  4. There will be some flag (default false) to enable only certified toolsets
  5. We will warn if a toolset is disabled by the flag

The idea then for new toolsets is that they will not be certified by default, until they get evals. For existing toolsets we can add a toolsetname_experimental toolset for adding/making large changes to existing tools

### Beta

For a tool/prompt/toolset to enter into "beta", we require that there are eval scenarios. For a toolset to enter "beta", there must be scenarios
excercising all of the tools and prompts in the toolset. For individual tools and prompts to enter "beta", we only require an eval scenario
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type excercising -> exercising

case "beta":
tmp = VersionBeta
case "ga", "", "stable":
tmp = VersionGA
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor issue:
The empty string "" is grouped with "ga" and "stable", so if a user has a config like:

default_toolset_version = ""

it silently becomes VersionGA instead of the default of VersionBeta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants