Skip to content

Carve out standalone LSP (codename Oak) from Ark #1117

@lionel-

Description

@lionel-

We'd like to extract the Ark LSP as a standalone product called Oak. This LSP:

  • Will be easily usable outside Positron
  • Will have a stronger model for scopes and control/data-flow analysis (accurate renames / goto-definition, unused and undefined diagnostics, etc)
  • Will be our foundation for thinking about type checking more

We're currently planning to model the analysis engine after Astral's ty SemanticIndex. This data structure will encode scopes and usage/definition sites. Type information will later be attached to symbols encoded in the index, starting with whether the symbol is a function definition, e.g. from a top-level sym <- function() {} assignment. This does not require any type annotation and will support lints such as "cannot subset object of type closure".

As soon as we add function type information for diagnostics, we'll need to resolve whether symbols from other packages are functions, and we'll need to do this at startup without any user interaction (a different situation from e.g. querying completions for foo:::*). In Ark, where we share the R session with the user, we can't silently load package namespaces in the background. That would be confusing for advanced users, especially if one of them fails to load.

One option is to fetch R sources from CRAN for static analysis (posit-dev/positron#2286). This would give us full source context including comments and file hierarchy, which is ideal for navigating in context with features like goto-definition. The main downside of this approach is that it only works for packages that can be found on CRAN. Until R installs packages with sources by default, we'll need a more robust approach.

Splitting Oak and Ark

We initially envisioned that Ark would still embed the full LSP in its own process, for several reasons:

  • We want to limit the proliferation of supporting processes in Positron instances.
  • Even with a standalone LSP, Ark still needs its own LSP to provide e.g. dynamic completions for data frames defined in the global environment. It was unclear how to articulate the static and dynamic LSP sources if they lived independently.

However the landscape has changed and Positron now has support for exactly that setup on the Python side. positron-language-server (https://github.com/posit-dev/positron/blob/038194ce/extensions/positron-python/python_files/posit/positron/positron_lsp.py#L6-L20) provides support for runtime code assistance, and the user can either use the builtin LSP for general static code assistance, or use any LSP provider of their choice (bring your own LSP). Positron then knows how to deduplicate results of both LSP and prioritise the results from the dynamic LSP: https://github.com/posit-dev/positron/blob/038194ce/src/vs/editor/contrib/suggest/browser/suggest.ts#L334-L362.

That setup is ideal for our use case of a standalone Oak LSP separate from the Ark LSP.

The joys of a standalone Oak

If we do split Oak from Ark, then there is nothing stopping Oak from launching an R session in its own process and load packages there.

  • We no longer need the source files and can load R AST from installed packages instead. The mapping is not fully one-to-one, but that's a good enough approximation especially to get us started.

    One thing that's slightly fuzzy is how to reconcile the R AST with the Rowan one. We can probably walk the R AST and generate the latter, like we did with tree-sitter. This will fail in some cases because of literal objects that are not typically parseable. The initial implementation can completely sidestep this issue, but longer term we don't want multiple analysis paths for each AST. In general, pushing for installing packages with sources by default (or at least srcrefs) would still be important for the longer term.

  • Oak can be in charge of serving R documentation to the frontend. Ark would still serve development R documentation managed by pkgload. That's a much more hygienic separation of tasks. On the Ark side, we'd require R to be idle to avoid running R code at interrupt time (see Remove usage of r_task() #691 and Remove usage of r_task() in the LSP #1095). On the Oak side, we're only querying R for short-running computations and can service help requests at any time.

  • Similar wins for other requests such as "is this statement complete". These would be provided by Oak rather than Ark. Anything that requires access to R at any time would be serviced by Oak.

Managing Oak instances for multiple versions of R

In non-Positron contexts, Oak will just run the R on the path. In Positron though, we'll need Oak to match the version of R running in Ark. This is complicated by multi-session support.

I think we can arrange that by launching as many Oak instances as there are versions of R running in multiple Ark. If all Ark instances run the same R version, we'd have only a single Oak server working for all of them. If there are two R versions running, we'd launch two instances of Oak with corresponding versions of R.

Only one Oak session is active at any time. When the user switches from one version of R to another, we'd switch the active Oak session to the matching version. This is similar (albeit different) to how we are currently switching LSP sessions living in Ark.

Complication: renv. We'll need one instance of Oak per renv.

Timeline

  1. Build a standalone oak binary. Most of the code is shared with Ark. No dependency on Amalthea. Needs a bit of research and experimentation regarding how the frontend API is implemented (do we still need ReadConsole), and some adjustments to things like r_task(), which will no longer run at interrupt time but send a request to the R thread via a channel.

    From Remove usage of r_task() in the LSP #1095, the only thing that needs a resolution is search path completions. We should instead detect active library() calls at cursor and use package exports information to feed the completions, just like we do already for diagnostics. The other R-backed features can remain as is for the time being.

    Deliverable: our LSP is now usable in other IDEs! Not negligible, it can be used by agents too (https://christophertkenny.com/posts/2026-03-08-r-lsp-claude/).

  2. Implement Oak + Ark support in the frontend and split dynamic and static handlers across Oak and Ark. At the same time, we allow users to "bring their own LSP", e.g. languageserver. That's the deliverable.

  3. We bring in a minimal semantic index with scopes and a symbol table. The symbols embed a notion of "is used" and "is bound". From there we can start porting some of the existing Ark features like completions, outline, workspace symbols, and go to definition to use the new infrastructure. We can also implement a rough version of rename, which would be the main deliverable.

  4. We add the usage-definition map. Deliverables: Precise goto-def, rename, and find-refs.

  5. We add type information about functions. Deliverable: diagnostics like "object of type closure is not subsettable", linting for partial argument matching, etc.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions