✨[Feature] Python Runtime Rework to Unify Both Runtime

# Unifying TorchTensorRTModule and PythonTorchTensorRTModule 

## TL;DR

Unify TorchTensorRTModule (C++ runtime) and PythonTorchTensorRTModule (Python runtime) into a single runtime module with shared logic. The unified module will dispatch to either a Python or C++ execute_engine implementation based on the build and a runtime flag. This reduces maintenance burden, eliminates behavioral inconsistencies, preserves backward compatibility, and enables serialization and re-export for the Python runtime.

## Goal(s)
Currently, we have two standalone runtime modules: TorchTensorRTModule for the C++ runtime and PythonTorchTensorRTModule for the Python runtime. These two runtime modules are implemented in different ways and don't share the same logic. This increases the maintenance difficulties and causes an inconsistency in the Torch-TensorRT runtime.

To address such an inconsistency, we are going to unify TorchTensorRTModule and PythonTorchTensorRTModule to create a single runtime module, calling the execute_engine ops that are implemented in Python and C++, depending on the build users choose. After such a change, the Python runtime can be serializable.




## Proposed APIs / UX
If users selects python only build, then the execute_engine op is only going to be implemented in Python. In a full build, users can control whether they want to use the Python implementation of execute_ops or the C++ version by using a context manager.
 
### Example Workflow
Same user UI for backward compatibility


### Design & Implementation
1. Implement a Python Engine class (wrapper of Python TensorRT Engine) similar to C++ TRTEngine.h
   - Make sure the serialization format is the same across both runtime
2. Implement an `execute_engine` op, which is the counterpart of the C++ execute_engine op. Support features that were supported in PythonTorchTensorRTModule
   - output allocator
   - cuda graph
   - unowned tensor
   - profiling
    
3. Implement an `execute_engine` op registration logic, depending on the user's selection and Torch-TensorRT build.
   - The controlling flag will no longer be a compilation/runtime setting. It is a system setting.
   - Add a global state of torch-tensorrt system that controls which runtime op is used.
   - Default to be C++ with a full build and default to Python if it is a python only build
   - Use a context manager to allow users to control the global state 



## Implementation Phases


### MVP `(<TARGET RELEASE VERSION>)` <T-Shirt Size Estimate>
 
Merge both runtimes; Delete the PythonTorchTensorRTModule and reimplement the execute_op

Build the registration logic
Enable serialization and re-export

### Extension Phase 1 `(<TARGET RELEASE VERSION>)` <T-Shirt Size Estimate>
Benchmark and make sure both runtime has the same performance

### Extension Phase 2 `(<TARGET RELEASE VERSION>)` <T-Shirt Size Estimate>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨[Feature] Python Runtime Rework to Unify Both Runtime #4047

Unifying TorchTensorRTModule and PythonTorchTensorRTModule

TL;DR

Goal(s)

Proposed APIs / UX

Example Workflow

Design & Implementation

Implementation Phases

MVP `(<TARGET RELEASE VERSION>)`

Extension Phase 1 `(<TARGET RELEASE VERSION>)`

Extension Phase 2 `(<TARGET RELEASE VERSION>)`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

✨[Feature] Python Runtime Rework to Unify Both Runtime #4047

Description

Unifying TorchTensorRTModule and PythonTorchTensorRTModule

TL;DR

Goal(s)

Proposed APIs / UX

Example Workflow

Design & Implementation

Implementation Phases

MVP (<TARGET RELEASE VERSION>)

Extension Phase 1 (<TARGET RELEASE VERSION>)

Extension Phase 2 (<TARGET RELEASE VERSION>)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

MVP `(<TARGET RELEASE VERSION>)`

Extension Phase 1 `(<TARGET RELEASE VERSION>)`

Extension Phase 2 `(<TARGET RELEASE VERSION>)`