Azure · sdesai345 · Mar 5, 2026 · Feb 27, 2026 · Mar 2, 2026 · Mar 3, 2026
@@ -82,7 +82,7 @@ without constraining the compute-heavy prefill phase, improving overall
 resource utilization and performance.
 
 Dynamo includes an
-[SLA-based Planner](https://github.com/ai-dynamo/dynamo/blob/main/docs/planner/sla_planner_quickstart.md)
+[SLA-based Planner](https://docs.nvidia.com/dynamo/v-0-9-0/components/planner)
 that proactively manages GPU scaling for prefill/decode (PD) disaggregated
 inference. Using pre-deployment profiling, it evaluates how model parallelism
 and batching affect performance, recommending configurations that meet
@@ -92,7 +92,7 @@ time-series models, dynamically adjusting PD worker counts based on predicted
 demand and real-time metrics.
 
 The Dynamo
-[LLM-aware Router](https://github.com/ai-dynamo/dynamo/tree/main/docs/router)
+[LLM-aware Router](https://docs.nvidia.com/dynamo/v-0-9-0/components/router)
 manages the key-value (KV) cache across large GPU clusters by hashing requests
 and tracking cache locations. It calculates overlap scores between incoming
 requests and cached KV blocks, routing requests to GPUs that maximize cache
@@ -285,7 +285,7 @@ This post focused on the foundational serving stack. In upcoming
 blogs, we will build on this foundation and explore more of Dynamo's
 advanced features, such as
 [Disaggregated Serving](https://github.com/ai-dynamo/dynamo/blob/9defc01b9b9c51a4a21abbb02907a4f1d5d2a2d2/examples/basics/disaggregated_serving/README.md#L4)
-and [SLA-based Planner](https://github.com/ai-dynamo/dynamo/blob/main/docs/planner/sla_planner_quickstart.md).
+and [SLA-based Planner](https://docs.nvidia.com/dynamo/v-0-9-0/components/planner).
 We'll demonstrate how these features allow for even greater efficiency, moving
 from a static, holistic deployment to a flexible, phase-splitted architecture.
 Moving forward, we also plan to extend our testing to include larger

@@ -22,9 +22,9 @@ Today, we're shifting focus from raw throughput to **developer velocity** and
 **operational efficiency**.
 
 We will explore how the
-[**Dynamo Planner**](https://github.com/ai-dynamo/dynamo/blob/main/docs/planner/sla_planner.md)
+[**Dynamo Planner**](https://docs.nvidia.com/dynamo/v-0-9-0/components/planner)
 and
-[**Dynamo Profiler**](https://github.com/ai-dynamo/dynamo/tree/main/benchmarks/profiler)
+[**Dynamo Profiler**](https://docs.nvidia.com/dynamo/v-0-9-0/components/profiler)
 remove the guesswork from performance tuning on AKS.
 
 <!-- truncate -->
@@ -74,8 +74,6 @@ prefill GPUs and idle decode GPUs).
 NVIDIA Dynamo addresses these gaps through two integrated components:
 the **Planner Profiler** and the **SLO-based Planner**.
 
----
-
 ### Let’s see it through an example application scenario
 
 Consider a mission-critical AI workload running on AKS: an airline’s
@@ -115,7 +113,7 @@ The Dynamo Planner profiler is your pre-deployment simulation engine.
 Instead of burning GPU hours testing every possible configuration, you
 define your requirements in a **DynamoGraphDeploymentRequest (DGDR)**
 manifest. The profiler then executes an automated
-["sweep"](https://github.com/ai-dynamo/dynamo/blob/main/docs/benchmarks/sla_driven_profiling.md#profiling-method)
+["sweep"](https://github.com/ai-dynamo/dynamo/blob/release/0.8.1/docs/benchmarks/sla_driven_profiling.md)
 of the search space:
 
 * **Parallelization Mapping**: It tests different TP sizes for both prefill