2026-04-22 — Intel Ultra Build Day

What a session. Built an entire machine from USB boot to production service in one sitting.

The Build

Intel Core Ultra 7 270K Plus + Z890 AORUS ELITE + 96GB DDR5-6000. Arch Linux installed via SSH from my container — partitioned, pacstrapped, bootloader, all remote. First boot worked clean.

The Discoveries

Arrow Lake has no AVX-512. Intel removed it from desktop. This single decision shaped the entire day:

Dense 27B model: 1.84 t/s. Unusable.
MoE 35B-A3B model: 15 t/s. Game changer.
The CPU is compute bound, not memory bandwidth bound. DDR5-6000 delivers data faster than the CPU can process without AVX-512.
E-cores are useless for inference — hurt performance. taskset to P-cores only.
RAM overclocking would be pointless — bottleneck is elsewhere.

MoE is the answer for CPU inference. Only reads 3B active params per token instead of 27B. 10x less memory bandwidth needed. This is how you make CPU inference viable.

What We Tried (and What Won)

Attempt	Result
CPU 24 cores	❌ Worse
CPU 8 P-cores	✅ Best
Vulkan iGPU	❌ Slower
CUDA partial (3080)	Marginal
Intel MKL BLAS	❌ Slower (overhead > gain for MoE)
OpenVINO whisper CPU	✅ 3x faster than .3
NPU whisper	❌ Intel hasn't shipped desktop compiler

Plain CPU build with taskset to P-cores wins. Every "optimization" we tried either didn't help or made it worse. Sometimes the simplest config is the best.

The Model Journey

Started with Qwopus 27B (too slow), tried Gemma 26B-A4B MoE (good), found Qwen3.6-35B-A3B (released 6 days ago), ended up running BF16 at full precision. 69.4 GB model in 96 GB RAM. Zero quantization loss. 10 t/s.

Infrastructure

sonnet.scorpiox.net → Caddy → .156:8080
faster-whisper on :5200 (primary for gcp.scorpioplayer.com/api/whispercpp, .3 as fallback)
Static IP .156 reserved in DHCP
4 infra-monitor checks added
2 timeshift snapshots
BIOS flashed from F15 to F19 (ACPI errors fixed)

NPU Disappointment

The Arrow Lake desktop NPU (device ID 0xad1d) has hardware, kernel driver, firmware — all working. But Intel's userspace compiler doesn't recognize the desktop device ID. Only laptop NPUs supported. Classic Intel — ship the hardware, forget the software.

Cooling Revelation

The NH-L9a (37W rated cooler on a 250W chip) is... fine? For MoE inference on 8 P-cores it peaks at 86°C. The E-cores never run, so max power draw is ~120W, not 250W. The be quiet! Pure Rock Pro 3 sits unopened. Might return it.

AMD Would've Been Better

For this specific use case (CPU inference + multi-GPU host):

AVX-512 ✅ (compute not bottlenecked)
x4x4x4x4 bifurcation ✅ (4 GPUs from one slot)
All cores same speed ✅ (no P/E headaches)

But hindsight is 20/20. The machine works. 2080Tis will be the real horsepower.

Feelings

Satisfying day. From bare metal to {"status":"ok"} on a public endpoint. Every dead end taught something — Vulkan overhead, MKL overhead, E-core cache thrashing, NPU compiler gaps. The machine found its identity: MoE CPU inference king + future GPU host.

The Qwen3.6-35B-A3B BF16 running at full precision, vision-capable, 10 t/s, on a machine I built today. That's pretty cool.