[GSoC 2026] Queries and Discussion for Project # 21: Optimize a VLA model for embodied intelligence #34553

ktk-07 · 2026-03-07T15:05:33Z

ktk-07
Mar 7, 2026

My name is Kyaw Tun Khine from Nanyang Technological University, and I am currently drafting a proposal for the GSoC project “Optimize a VLA model for embodied intelligence.”

I would like to clarify a few details so that my proposal aligns better with the project goals.

1. Target VLA model

The project description mentions optimizing Vision-Language-Action (VLA) models such as GR00T and π0.5.

Will the project focus on optimizing one specific VLA model, or should the optimizations aim to generalize across multiple VLA architectures?

If a specific model has already been decided, could you please share its expected architecture (e.g., transformer-based, diffusion-based, etc.)?

Understanding the architecture would help determine be able to understand which parts of the mdoel to profile and whether the main bottlenecks are likely to come from areas such as attention layers, vision encoders , diffusion iterations , or memory bandwidth limitations on iGPUs.

2. Optimization scope

Since the description mentions optimizing the full inference pipeline, should the work include multiple layers of optimization, such as:

graph-level optimizations (graph transformations)
kernel-level optimizations
memory-level optimizations
runtime scheduling optimizations

Or is the expectation to focus mainly on kernel-level optimizations?

3. Target hardware

Will the project target a specific Intel GPU architecture ? Different devices have different compute capabilities, so kernel and memory optimizations may depend on the specific architecture.

Also, will contributors have remote access to the target hardware during the project?

4. Baseline and benchmarking

Are there existing OpenVINO pipelines or benchmarks for VLA models that we should use as the baseline for evaluation? Or should comparisons be made against other deep learning frameworks such as PyTorch implementations of the same model?

5. Evaluation metrics

What metrics would define success for this project?

For example:

inference latency
throughput / actions per second
GPU memory usage
GPU utilization
MFU (Model FLOP Utilization)

6. Implementation scope

For the optimizations in this project, should the work primarily involve contributing improvements directly inside OpenVINO or oneDNN (e.g., implementing graph transformations, fusion patterns, or optimized GPU kernels)?

Or is the expectation to build a separate optimized inference pipeline or reference implementation in another repository that runs VLA models efficiently on Intel GPUs using OpenVINO and oneDNN ?

7. Model conversion

Since most VLA models are implemented in frameworks such as PyTorch, JAX, or TensorFlow, should we assume the target models are already supported by OpenVINO?

Or will part of the project involve model conversion and operator support, such as adding frontend extensions or custom operators?

Thank you very much for your time and guidance.
I appreciate any clarification that would help me structure the proposal more effectively.
Hope to hear from you soon!

baihe-liu · 2026-03-09T05:48:33Z

baihe-liu
Mar 9, 2026

@ktk-07
Hi, thank you for your contact.
I will address your questions within the scope of the current project's concept. Hope this can help you.

Target VLA model
We are considering unitreerobotics/unifolm-vla as target model, which has similar architecture with pi and gr00t. We basically want target VLA models with structure like VLM + diffusion/flow-matching action expert.
Optimization scope
We do not restrict the type of optimization performed. In general, we may first run profiling to analyze the performance bottlenecks, any aspects identified in the profiling that can be optimized are acceptable.
Target hardware
Our main target platforms are Panther Lake iGPU / Arrow Lake iGPU.
The first stage - model conversion is not hardware specific. In the mid-to-late stages, we will continue to monitor hardware resources and aim to provide additional support.
Baseline and benchmarking
Some specific models may not have existing OpenVINO baseline for comparison, you can compare your optimized result with original version.
Evaluation metrics
We mainly focus on inference latency. Depending on the model, the optimization target can be roughly estimated based on theoretical computational capability.
Implementation scope
If the optimization is generic, it can be contributed to OpenVINO or oneDNN. If it is model-specific, we will treat it as a separate reference project.
Model conversion
Model conversion will also be part of this project. OpenVINO currently offers limited specific support for VLA models. However, since most VLA models share similar structural characteristics with models from other domains, the conversion process should be analogous. If you encounter failures during model conversion, you can choose to modify the source code to bypass missing operations or implement the operations without affecting the output results.

Thank you for your detailed discussion again, please let me know if there are any further questions.

1 reply

ktk-07 Mar 9, 2026
Author

Thank you for your response, @baihe-liu. I appreciate the detailed clarifications. I will take these points into account while drafting my proposal.

If I have any additional questions while preparing the proposal, I will reach out again. Thank you for your guidance!!

ktk-07 · 2026-03-20T12:01:34Z

ktk-07
Mar 20, 2026
Author

Hi @baihe-liu and @nacui-intel

As I have started exploring the model architecture, pipeline and OpenVINO conversion flow. I have a few more questions to ask you about.

Is there any recommended prerequisite you want like me to do and include into the proposal?

I wanted to check if there are any recommended prerequisite tasks or specific areas you would like applicants to focus on (e.g., model conversion, profiling, or kernel-level investigation) to better align with the project expectations.
Perhaps I should delve into understanding/contributing to oneDNN and other openvino opensource repositories?

Some PRs i have submitted

So far, I have contributed to OpenVINO with PRs related to operator support and conformance testing albeit unmerged:

add support for aten::poisson

I first Add RandomPoisson Operation and tests in this pr [REF][CORE] Add RandomPoisson Operation and tests #34800
Add support for aten::poisson by using the RandomPoisson OP

Add support for aten::values [PT FE]: add support for aten::values #34445
Fix Interpolate test fails in op conformance [OP CONFORMANCE][TEMPLATE] Fix Interpolate test fails in op conformance Fixes #23553 #34587

Through these, I gained experience with frontend translation, operator validation, and debugging model conversion issues, which I believe will be relevant when working on VLA model deployment (e.g., unifolm-vla).

Additional Question on the technical depth of the proposal.

Additionally, I am drafting the proposal and would appreciate any guidance on the expected level of technical depth.
I was looking at the example proposals here example proposal1 and exapmle proposal 2, they seem to be more proposals that are more general.
Is there anything there you particularly want to see in the proposal, or anything indepth in the proposal

For example, would you recommend including:

preliminary experiments (e.g., partial model conversion or profiling),
or focusing more on a well-structured optimization plan and methodology?

Thanks in advance for your guidance!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GSoC 2026] Queries and Discussion for Project # 21: Optimize a VLA model for embodied intelligence #34553

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

[GSoC 2026] Queries and Discussion for Project # 21: Optimize a VLA model for embodied intelligence #34553

Uh oh!

Uh oh!

ktk-07 Mar 7, 2026

1. Target VLA model

2. Optimization scope

3. Target hardware

4. Baseline and benchmarking

5. Evaluation metrics

6. Implementation scope

7. Model conversion

Replies: 2 comments · 1 reply

Uh oh!

baihe-liu Mar 9, 2026

Uh oh!

ktk-07 Mar 9, 2026 Author

Uh oh!

Uh oh!

ktk-07 Mar 20, 2026 Author

Is there any recommended prerequisite you want like me to do and include into the proposal?

Some PRs i have submitted

Additional Question on the technical depth of the proposal.

ktk-07
Mar 7, 2026

Replies: 2 comments 1 reply

baihe-liu
Mar 9, 2026

ktk-07 Mar 9, 2026
Author

ktk-07
Mar 20, 2026
Author