Skip to content

Commit 29f08db

Browse files
committed
Part-2 recent refactor changes
Signed-off-by: Will Guo <willg@nvidia.com>
1 parent 345a3dc commit 29f08db

File tree

4 files changed

+366
-2870
lines changed

4 files changed

+366
-2870
lines changed
Lines changed: 2 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: Apache-2.0
33
#
44
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -19,23 +19,6 @@
1919
in ONNX computation graphs to minimize TensorRT inference latency. It uses pattern-based
2020
region analysis to efficiently explore and optimize Q/DQ insertion strategies.
2121
22-
**Key Features:**
23-
24-
- **Automated Region Discovery**: Hierarchical decomposition of computation graphs into
25-
LEAF and COMPOSITE regions with automatic pattern identification
26-
27-
- **Pattern-Based Optimization**: Groups structurally-similar regions and optimizes them
28-
together, making the process efficient and consistent
29-
30-
- **TensorRT Performance Measurement**: Direct integration with TensorRT Python API for
31-
accurate latency profiling of each Q/DQ configuration
32-
33-
- **State Management**: Checkpoint/resume capability for long-running optimizations with
34-
incremental state saving after each region
35-
36-
- **Pattern Cache**: Warm-start optimization using learned schemes from previous runs,
37-
enabling transfer learning across models
38-
3922
**Core Components:**
4023
4124
Autotuner Classes:
@@ -64,26 +47,6 @@
6447
- TensorRTPyBenchmark: Benchmark using TensorRT Python API (recommended)
6548
- TrtExecBenchmark: Benchmark using trtexec command-line tool (legacy)
6649
67-
**Quick Start:**
68-
69-
>>> from modelopt.onnx.quantization.autotune import QDQAutotuner, Config
70-
>>> import onnx
71-
>>> # Load model and initialize autotuner
72-
>>> model = onnx.load("model.onnx")
73-
>>> autotuner = QDQAutotuner(model)
74-
>>> # Configure autotuning parameters
75-
>>> config = Config(default_quant_type="int8")
76-
>>> autotuner.initialize(config)
77-
>>> # Generate and test Q/DQ schemes
78-
>>> # (see workflows.region_pattern_autotuning_workflow for complete example)
79-
80-
**Command-Line Interface:**
81-
82-
The package can be run directly as a module:
83-
84-
$ python -m modelopt.onnx.quantization.autotune --model model.onnx --output ./output
85-
$ python -m modelopt.onnx.quantization.autotune --model model.onnx --quant-type fp8
86-
8750
**See Also:**
8851
8952
- workflows.region_pattern_autotuning_workflow: Complete end-to-end optimization
@@ -101,44 +64,32 @@
10164
PatternCache,
10265
PatternSchemes,
10366
Region,
104-
RegionError,
10567
RegionType,
10668
)
107-
108-
# Insertion points (from dedicated module)
10969
from .insertion_points import (
11070
ChildRegionInputInsertionPoint,
11171
NodeInputInsertionPoint,
11272
RegionOutputInsertionPoint,
11373
ResolvedInsertionPoint,
11474
)
115-
116-
# Pattern analysis
11775
from .region_pattern import RegionPattern
118-
119-
# Region search
12076
from .region_search import CombinedRegionSearch
12177

122-
# Public API
12378
__all__ = [
124-
# Exceptions
12579
"AutotunerError",
12680
"AutotunerNotInitializedError",
12781
"ChildRegionInputInsertionPoint",
12882
"CombinedRegionSearch",
129-
# Configuration and state
13083
"Config",
131-
# Q/DQ insertion
13284
"InsertionScheme",
13385
"InvalidSchemeError",
13486
"NodeInputInsertionPoint",
135-
"ResolvedInsertionPoint",
13687
"PatternCache",
13788
"PatternSchemes",
138-
# Region classes
13989
"Region",
14090
"RegionError",
14191
"RegionOutputInsertionPoint",
14292
"RegionPattern",
14393
"RegionType",
94+
"ResolvedInsertionPoint",
14495
]

0 commit comments

Comments
 (0)