Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions website/blog/2025-10-24-dynamo-on-aks/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ without constraining the compute-heavy prefill phase, improving overall
resource utilization and performance.

Dynamo includes an
[SLA-based Planner](https://github.com/ai-dynamo/dynamo/blob/main/docs/planner/sla_planner_quickstart.md)
[SLA-based Planner](https://docs.nvidia.com/dynamo/v-0-9-0/components/planner)
that proactively manages GPU scaling for prefill/decode (PD) disaggregated
inference. Using pre-deployment profiling, it evaluates how model parallelism
and batching affect performance, recommending configurations that meet
Expand All @@ -92,7 +92,7 @@ time-series models, dynamically adjusting PD worker counts based on predicted
demand and real-time metrics.

The Dynamo
[LLM-aware Router](https://github.com/ai-dynamo/dynamo/tree/main/docs/router)
[LLM-aware Router](https://docs.nvidia.com/dynamo/v-0-9-0/components/router)
manages the key-value (KV) cache across large GPU clusters by hashing requests
and tracking cache locations. It calculates overlap scores between incoming
requests and cached KV blocks, routing requests to GPUs that maximize cache
Expand Down Expand Up @@ -285,7 +285,7 @@ This post focused on the foundational serving stack. In upcoming
blogs, we will build on this foundation and explore more of Dynamo's
advanced features, such as
[Disaggregated Serving](https://github.com/ai-dynamo/dynamo/blob/9defc01b9b9c51a4a21abbb02907a4f1d5d2a2d2/examples/basics/disaggregated_serving/README.md#L4)
and [SLA-based Planner](https://github.com/ai-dynamo/dynamo/blob/main/docs/planner/sla_planner_quickstart.md).
and [SLA-based Planner](https://docs.nvidia.com/dynamo/v-0-9-0/components/planner).
We'll demonstrate how these features allow for even greater efficiency, moving
from a static, holistic deployment to a flexible, phase-splitted architecture.
Moving forward, we also plan to extend our testing to include larger
Expand Down
8 changes: 3 additions & 5 deletions website/blog/2026-01-22-dynamo-on-aks-part-2/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ Today, we're shifting focus from raw throughput to **developer velocity** and
**operational efficiency**.

We will explore how the
[**Dynamo Planner**](https://github.com/ai-dynamo/dynamo/blob/main/docs/planner/sla_planner.md)
[**Dynamo Planner**](https://docs.nvidia.com/dynamo/v-0-9-0/components/planner)
and
[**Dynamo Profiler**](https://github.com/ai-dynamo/dynamo/tree/main/benchmarks/profiler)
[**Dynamo Profiler**](https://docs.nvidia.com/dynamo/v-0-9-0/components/profiler)
remove the guesswork from performance tuning on AKS.

<!-- truncate -->
Expand Down Expand Up @@ -74,8 +74,6 @@ prefill GPUs and idle decode GPUs).
NVIDIA Dynamo addresses these gaps through two integrated components:
the **Planner Profiler** and the **SLO-based Planner**.

---

### Let’s see it through an example application scenario

Consider a mission-critical AI workload running on AKS: an airline’s
Expand Down Expand Up @@ -115,7 +113,7 @@ The Dynamo Planner profiler is your pre-deployment simulation engine.
Instead of burning GPU hours testing every possible configuration, you
define your requirements in a **DynamoGraphDeploymentRequest (DGDR)**
manifest. The profiler then executes an automated
["sweep"](https://github.com/ai-dynamo/dynamo/blob/main/docs/benchmarks/sla_driven_profiling.md#profiling-method)
["sweep"](https://github.com/ai-dynamo/dynamo/blob/release/0.8.1/docs/benchmarks/sla_driven_profiling.md)
of the search space:

* **Parallelization Mapping**: It tests different TP sizes for both prefill
Expand Down