Intel is following AMD in adding a crucial feature to Core Ultra — especially if you’re using local AI

For some time, AMD has included an appealing feature in its APU (Accelerated Processing Unit) that catches the attention of both gamers and local AI users – Variable Graphics Memory. Recently, Intel has decided to adopt a comparable feature for its Core Ultra processors as well.

According to Bob Duffy from Intel, as reported by VideoCardz, the new Shared GPU Memory Override feature is included in the most recent update of the Arc drivers.

So, what is it?

In simpler words, just like on AMD’s latest APU models, you will now have the ability to allocate a specific portion of your overall system RAM for the GPU. This feature can enhance gaming experience, but it’s particularly beneficial when running local Large Language Models (LLMs) on your computer.

Currently, Ollama does not offer GPU integration, unlike platforms such as LM Studio, which can accommodate larger models like gpt-oss:20b on the GPU rather than the CPU for faster processing.

These models can operate effectively without explicitly assigning larger amounts of memory to the GPU, although there are advantages to doing so. However, Intel’s Core Ultra processors do not currently support true Unified Memory like what you find in an Apple Mac or AMD’s latest Strix Halo chips. While it may seem similar, it is not. This feature would be unnecessary on a system with Unified Memory.

Based on my limited experiments with an AMD Ryzen AI 9 HX 370 that does not support Unified Memory, significantly increasing the GPU’s allotted memory can improve performance.

As an observer, I’ve noticed that in GPT-oss:20b, there’s a significant boost in performance – approximately 5 tokens per second higher – when the model can be fully loaded into a dedicated GPU memory with a 4k context window, compared to when it relies on overall system memory.

You can utilize the GPU for computing tasks, but with the RAM alone, performance will be slower. The optimal situation would be to allocate sufficient dedicated GPU memory to load the model into, ensuring better overall performance.

Intel’s now enabling Core Ultra users to adjust GPU memory reservation via a straightforward slider in their Graphics Software. However, it’s not entirely clear if this applies to all Core Ultra models or just the Core Ultra Series 2.

When working with a larger model like gpt-oss:20b, I divide the 32GB of memory equally into two parts. 16GB is allocated for the GPU, while the remaining 16GB is reserved for other system functions. This setup ensures that the entire model fits within the GPU’s memory, leaving the rest of the system’s resources untouched.

To get top-notch performance from the LLM, it’s more effective to utilize a GPU when available, rather than relying solely on the CPU. In fact, even integrated GPUs can yield superior results compared to using only the CPU in this particular case.

Absolutely, it’s important to remember that everything is relative. If your system has only 16GB of total memory, you can’t simply dedicate all of it to running a Large Language Model (LLM), as your computer also needs memory for other tasks running under Windows. It would be beneficial to reserve at least 8GB of the memory for the rest of the system operations.

To use the new Shared GPU Memory Override feature, ensure you are running the most recent Intel graphics drivers. Please note that this feature is relevant only if your PC exclusively uses integrated Arc graphics. Systems with dedicated GPUs featuring their own VRAM won’t require this feature and will generally provide better performance in any scenario.

If you’re employing local Language Model Models within your Core Ultra system, this new feature could potentially enhance your AI tasks by providing a slight boost in performance.

Read More

2025-08-18 16:09