GSoC 2026: Interested in Project 1 - Build a GUI Agent with local LLM/VLM and OpenVINO #34239
Replies: 4 comments 2 replies
-
|
Hi @KarSri7694 Thanks for your interests. We can offer a remote device with 32GB RAM (18GB vRAM) for your development. For model selection, it all depends on you, and we will only evaluate the user experience for final results. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Ethan (@openvino-dev-samples ) and Zhuo (@zhuo-yoyowz) I’ve just sent you an email with the draft of my GSoC proposal for the project. Whenever you have time, I would greatly appreciate any feedback before I submit the final proposal on the GSoC website. Thanks for your time! |
Beta Was this translation helpful? Give feedback.
-
|
Hey, Ethan (@openvino-dev-samples) and Zhuo (@zhuo-yoyowz) Can you please look at my question:- #34555 I have created a code that implements get_state and set_state even when KV cache is quantized to q8_0, my question is: void VariableStateIndirectKVCacheCompressed::set_state(const ov::SoPtr<ov::ITensor>& state) {
OPENVINO_THROW("[GPU] set_state API is supported only when KV-cache compression is disabled");
}
ov::SoPtr<ov::ITensor> VariableStateIndirectKVCacheCompressed::get_state() const {
OPENVINO_THROW("[GPU] get_state API is supported only when KV-cache compression is disabled");
}Is the code OPENVINO_THROW intentional and expected behaviour, or is it unimplemented? Also, currently my code only works when the KV cache is quantized to int8, should i also implement this for 4 bit KV cache quantitation? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @openvino-dev-samples and @zhuo-yoyowz I have submitted my final proposal through the official GSoC portal. Looking forward to work under your guidance in the coming summer. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am Kartikeya Srivastava, a 2nd year B.Tech undergrad in Computer Science and Engineering. I am highly interested in Project 1: Build a GUI Agent with local LLM/VLM and OpenVINO.
I already have PRs submitted as a prerequisite:
numpy.diagonaloperation to keras openvino backend (Merged)- Numpy.diagonal PRnumpy.flipoperation to keras openvino backend (Merged)- Numpy.flip PRI have been building a local OS agent to understand the constraints of Project 1. I successfully engineered a multi-step agentic loop that can autonomously navigate the Windows GUI (demo attached).
Prompt given in demo: open a notepad and type in it; "The text is written by Ambient AI's vision agent"
Note on the demo video: To achieve the reasoning required for this multi-step actions, the prototype relies on Qwen-3-VL-4B, because the OpenVINO openvino_genai pipeline does not yet natively support Qwen-3-VL, this specific demo was temporarily routed through llama.cpp on CUDA to validate the agentic orchestration logic.
demo_video.mp4
I have one question regarding the scope of this project - My primary GSoC objective would be porting this exact agentic loop natively to OpenVINO as well as to enhance the model's state management to successfully perform long-horizon tasks around ~50 steps. When optimizing the VLM for typical AI PCs (e.g., Intel Core Ultra NPUs or Iris Xe iGPUs), what is the strict memory budget we are targeting? Should I optimize a 4B parameter model for maximum performance while still maintaining respectable accuracy by quantizing the model to INT8 or INT4 format and quantizing the KV cache to reduce memory requirements even further or do we have memory budget to use larger, more capable models?
cc: Ethan Yang (@openvino-dev-samples), Zhuo Wu (@zhuo-yoyowz )
Beta Was this translation helpful? Give feedback.
All reactions