What 'AI Mini PC' Actually Means in 2026: NPU vs iGPU vs CPU for Local LLMs

Every mini PC announced since the second half of 2024 ships with a number on its spec sheet that did not exist on the same shelf two years ago: TOPS. Intel says “up to 36 TOPS” on Lunar Lake. AMD says “up to 50 TOPS” on the XDNA 2 NPU inside Ryzen AI 300. Qualcomm says “45 TOPS” on Snapdragon X Elite’s Hexagon. Apple says “38 TOPS” on the M4’s Neural Engine. All of these are peak INT8 figures, all of them are technically comparable, and all of them are heavily marketing-shaped — because the actual question a buyer of an “AI mini PC” should be asking is not how many TOPS the NPU claims, but what those TOPS actually accelerate today. As of April 2026, the answer is narrower than the box art implies.

The marketing layer: TOPS and the Copilot+ bar

The TOPS number on a mini-PC datasheet is, almost without exception, an NPU-only figure. Microsoft’s Copilot+ PC programme sets the floor at 40 NPU TOPS, which is what determines whether a machine is allowed to ship with on-device features such as Recall, Cocreator, Live Captions with translation, and Windows Studio Effects. AMD’s Ryzen AI 300 series clears the bar at 50 TOPS via XDNA 2; Qualcomm’s Snapdragon X Elite clears it at 45; Intel’s Lunar Lake reaches it at 48 (AnandTech architecture deep-dive). Earlier Core Ultra 100 / 200 (“Meteor Lake” and “Arrow Lake”) miss the cut and are excluded from Copilot+ branding even though many of those chips ship inside boxes still marketed as “AI mini PCs.”

The most awkward case is Apple. The M4’s Neural Engine is rated at 38 TOPS, which is below Microsoft’s threshold — yet in real-world workloads the M4 routinely matches or beats nominally-faster x86 NPUs (NotebookCheck Mac mini M4 Pro review). The “bar” rewards a number, not a result.

What NPUs actually accelerate today

In April 2026, the workloads that genuinely run on an NPU on a Windows or Mac mini PC are a short list:

Windows Studio Effects — camera background blur, eye-contact correction, voice clarity, framing.
Live Captions / on-device translation in Windows 11 24H2 and later.
App-specific AI features, such as the on-device pieces of Adobe’s Sensei stack and a handful of Microsoft 365 features, where the vendor has explicitly shipped an NPU code path (DirectML, ONNX Runtime QNN, OpenVINO, or Core ML).
Some Whisper / small ASR implementations, where the model is small enough to fit comfortably in the NPU’s tile memory and the runtime has a quantised graph ready.

That is broadly the full list of things the average user can verify is actually hitting the NPU on a stock 2026 mini PC.

What NPUs (mostly) do not accelerate

The popular local-LLM stack — llama.cpp, Ollama, LM Studio, KoboldCpp, text-generation-webui — runs on CPU and iGPU paths in 2026, not on the NPU. There is real movement here: llama.cpp merged a QNN backend targeting Snapdragon’s Hexagon NPU in late 2024, and DirectML now exposes an NPU execution provider on Windows. But for Intel and AMD silicon specifically, mainstream LLM tooling still defaults to Vulkan/iGPU or AVX2/AVX-512 on the CPU, because the NPU software stacks (OpenVINO GenAI on Intel, Ryzen AI Software / XDNA on AMD) are either narrow in model coverage or limited to specific quantisation formats.

Image generation tells the same story. Stable Diffusion XL and Flux runs on a 2026 mini PC happen on the iGPU (Intel Arc, Radeon 880M / 890M / 8060S) or — increasingly common — on a discrete eGPU. NPU-accelerated SDXL exists in vendor demos and in Olive-optimised ONNX builds, but it is not what users actually run when they install Automatic1111, ComfyUI, or InvokeAI.

The reality stack, by platform

Apple Silicon (Mac mini M4 / M4 Pro). The Neural Engine, the Metal GPU, and the AMX matrix units are coordinated through MLX and Core ML. For a buyer who wants to run a 7B-13B LLM today with the least friction, a Mac mini M4 Pro with 48-64 GB of unified memory is, in our reading, the best mainstream stack on the market — high memory bandwidth, mature tooling, and a single runtime story.

AMD Ryzen AI / Ryzen AI Max+ (“Strix Halo”). The workhorse for LLMs on these chips is the Radeon iGPU (780M, 8060S) under ROCm or Vulkan, not the XDNA 2 NPU. The NPU is impressive on paper and lights up for Studio Effects and a small set of vendor demos, but for general LLM serving in April 2026 it is mostly idle. Where Strix Halo wins is unified memory: Phoronix and ServeTheHome have both shown that a 128 GB Strix Halo box (e.g. the GMKtec EVO-X2) runs models that no NPU-headlined mid-tier mini PC can load at all.

Intel Core Ultra (Lunar Lake / Arrow Lake-H). OpenVINO has real LLM support, but Ollama, LM Studio, and llama.cpp all prefer the Arc iGPU via SYCL/Vulkan or the CPU. On an ASUS ROG NUC 14 Performance, the NPU contributes to Studio Effects and a handful of UWP features; the LLM you actually load runs on the Arc graphics tile.

Snapdragon X Elite (Hexagon NPU). This is the one platform where there is a credible, merged path from a mainstream LLM runtime to the NPU — the llama.cpp QNN backend. The catch is market share: Snapdragon X mini PCs are a thin slice of the 2026 lineup, and Windows-on-ARM still has app-compatibility caveats that Intel and AMD machines do not.

What actually matters for a 2026 mini-PC buyer

If the use case is local LLMs, the honest priority order in April 2026 is memory capacity, then memory bandwidth, then iGPU class, and only then NPU TOPS. A few concrete examples:

An Apple Mac mini M4 Pro with 64 GB outperforms most “50 TOPS Copilot+” mini PCs for actual LLM tokens-per-second, despite a lower NPU rating.
A GMKtec EVO-X2 with 128 GB unified memory will load and run models that no NPU-headlined mid-tier mini PC can fit, regardless of TOPS marketing.
A Beelink SER8 or Geekom A8 Max with 64 GB DDR5 and a ~16 TOPS NPU is genuinely sufficient for 7B-class LLMs at usable speeds — the iGPU and the RAM are doing the work.

This will shift. AMD’s XDNA 2 software is improving release-on-release, Microsoft is pushing DirectML NPU paths into more frameworks, and the Hexagon llama.cpp backend is now real code, not a slide. By 2027 the picture for NPU-accelerated LLMs will be meaningfully better.

But that is 2027. In April 2026, the TOPS number on the front of the box is not the number that determines whether your local LLM is fast. The model fits in your RAM, or it does not. The iGPU has a working Vulkan or ROCm path, or it does not. The NPU mostly does video calls and captions. A buyer who reads the spec sheet in that order — RAM, bandwidth, iGPU, then NPU — will end up with a faster mini PC for local AI than a buyer who chases the largest TOPS figure on the shelf.