[GSoC 2026] Queries and Discussion for Project # 21: Optimize a VLA model for embodied intelligence #34553
Replies: 2 comments 1 reply
-
|
@ktk-07
Thank you for your detailed discussion again, please let me know if there are any further questions. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @baihe-liu and @nacui-intel As I have started exploring the model architecture, pipeline and OpenVINO conversion flow. I have a few more questions to ask you about. Is there any recommended prerequisite you want like me to do and include into the proposal?I wanted to check if there are any recommended prerequisite tasks or specific areas you would like applicants to focus on (e.g., model conversion, profiling, or kernel-level investigation) to better align with the project expectations. Some PRs i have submittedSo far, I have contributed to OpenVINO with PRs related to operator support and conformance testing albeit unmerged:
Through these, I gained experience with frontend translation, operator validation, and debugging model conversion issues, which I believe will be relevant when working on VLA model deployment (e.g., unifolm-vla). Additional Question on the technical depth of the proposal.Additionally, I am drafting the proposal and would appreciate any guidance on the expected level of technical depth. For example, would you recommend including:
Thanks in advance for your guidance! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @nacui-intel @baihe-liu,
My name is Kyaw Tun Khine from Nanyang Technological University, and I am currently drafting a proposal for the GSoC project “Optimize a VLA model for embodied intelligence.”
I would like to clarify a few details so that my proposal aligns better with the project goals.
1. Target VLA model
The project description mentions optimizing Vision-Language-Action (VLA) models such as GR00T and π0.5.
Will the project focus on optimizing one specific VLA model, or should the optimizations aim to generalize across multiple VLA architectures?
If a specific model has already been decided, could you please share its expected architecture (e.g., transformer-based, diffusion-based, etc.)?
Understanding the architecture would help determine be able to understand which parts of the mdoel to profile and whether the main bottlenecks are likely to come from areas such as attention layers, vision encoders , diffusion iterations , or memory bandwidth limitations on iGPUs.
2. Optimization scope
Since the description mentions optimizing the full inference pipeline, should the work include multiple layers of optimization, such as:
Or is the expectation to focus mainly on kernel-level optimizations?
3. Target hardware
Will the project target a specific Intel GPU architecture ? Different devices have different compute capabilities, so kernel and memory optimizations may depend on the specific architecture.
Also, will contributors have remote access to the target hardware during the project?
4. Baseline and benchmarking
Are there existing OpenVINO pipelines or benchmarks for VLA models that we should use as the baseline for evaluation? Or should comparisons be made against other deep learning frameworks such as PyTorch implementations of the same model?
5. Evaluation metrics
What metrics would define success for this project?
For example:
6. Implementation scope
For the optimizations in this project, should the work primarily involve contributing improvements directly inside OpenVINO or oneDNN (e.g., implementing graph transformations, fusion patterns, or optimized GPU kernels)?
Or is the expectation to build a separate optimized inference pipeline or reference implementation in another repository that runs VLA models efficiently on Intel GPUs using OpenVINO and oneDNN ?
7. Model conversion
Since most VLA models are implemented in frameworks such as PyTorch, JAX, or TensorFlow, should we assume the target models are already supported by OpenVINO?
Or will part of the project involve model conversion and operator support, such as adding frontend extensions or custom operators?
Thank you very much for your time and guidance.
I appreciate any clarification that would help me structure the proposal more effectively.
Hope to hear from you soon!
Beta Was this translation helpful? Give feedback.
All reactions