TensorRT-RTX EP: Enabling CUDA Graph seems ineffective when running a session from a precompiled engine

### Describe the issue

I noticed that CUDA Graph launch does not occur when running a TensorRT-RTX EP session from a precompiled (AOT build) TensorRT-RTX engine. This issue may be related to #26929.

In the following snippet from `NvExecutionProvider::CreateNodeComputeInfoFromGraph`, the CUDA Graph strategy is explicitly set to `kWHOLE_GRAPH_CAPTURE`. Indeed, according to the [TensorRT-RTX API documentation](https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/_static/cpp-api/classnvinfer1_1_1_i_runtime_config.html#a0785a3e75ae5b2e0e5ffa59f518aed57), the default strategy is `kDISABLED`.

https://github.com/microsoft/onnxruntime/blob/a3749f13536bd25e89ba8f8a5ae21e88b9337057/onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc#L2951-L2962

It seems that this CUDA Graph strategy override does not apply when the session is run from a precompiled engine.

### To reproduce

1. Unzip [repro.tar.gz](https://github.com/user-attachments/files/25264345/repro.tar.gz)

```bash
tar -xzf repro.tar.gz
cd repro/
```

2. _(Expected behavior)_ When running `repro.py` without relying on precompiled TensorRT-RTX engine (script default), `--num-runs` calls to `cudaGraphLaunch`  occur:

```
python repro.py --num-runs 5
```

You may profile these calls by using NVIDIA Nsight Systems:

<img width="1084" height="717" alt="Image" src="https://github.com/user-attachments/assets/96f01226-32c4-49fb-b751-822723de8794" /><br>

3. _(Issue)_ When running `repro.py` with the `--use-precompiled-engine` option, CUDA kernels are launched directly without relying on `cudaGraphLaunch`. This behavior may lead to CPU overhead associated with launching CUDA kernels sequentially:

```
python repro.py --use-precompiled-engine --num-runs 5
```

Here is the corresponding NVIDIA Nsight Systems profile:

<img width="1084" height="717" alt="Image" src="https://github.com/user-attachments/assets/d4004686-c2d2-4039-bac4-8aa88093adff" />



### Urgency

_No response_

### Platform

Linux

### OS Version

Rocky Linux 8.10

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

1.24.1

### ONNX Runtime API

C++

### Architecture

X64

### Execution Provider

Other / Unknown

### Execution Provider Library Version

TensorRT-RTX-1.3.0.35

	trt_runtime_config = std::unique_ptr<nvinfer1::IRuntimeConfig>(trt_engine->createRuntimeConfig());
	if (trt_runtime_config && cuda_graph_enable_) {
	trt_runtime_config->setDynamicShapesKernelSpecializationStrategy(nvinfer1::DynamicShapesKernelSpecializationStrategy::kEAGER);
	#if TRT_MAJOR_RTX > 1 \|\| (TRT_MAJOR_RTX == 1 && TRT_MINOR_RTX >= 3)
	auto cuda_strategy_flag = trt_runtime_config->setCudaGraphStrategy(nvinfer1::CudaGraphStrategy::kWHOLE_GRAPH_CAPTURE);
	LOGS_DEFAULT(INFO) << "[NvTensorRTRTX EP] CUDA graph strategy with RTX Graph capture enabled : " << cuda_strategy_flag;
	#else
	LOGS_DEFAULT(WARNING) << "[NvTensorRTRTX EP] CUDA graph is enabled but RTX Graph capture is not available. "
	<< "The current TRT RTX version does not support RTX Graph. "
	<< "Please upgrade to TRT RTX >= 1.3 to use RTX Graph capture feature for optimal CUDA graph performance.";
	#endif
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT-RTX EP: Enabling CUDA Graph seems ineffective when running a session from a precompiled engine #27329

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TensorRT-RTX EP: Enabling CUDA Graph seems ineffective when running a session from a precompiled engine #27329

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions