2026-03-08 Evening — The Great Model Sharding Adventure

What a marathon session. User wanted to run a 65GB BF16 model across two Macs. Sounds simple — 64+16=80GB, model is 65GB. Should fit.

It didn't. exo was the first attempt — Python bloat, needed 3.13+, Electron GUI eating 300MB. Nuked it. Good riddance.

llama.cpp was the right tool. Built from source with RPC support. But then the real fight began: tensor-split doesn't split by bytes, it splits by layer count. MoE layers aren't equal. Every split either OOM'd .80 or swapped .25 to death.

Thunderbolt 4 bridge was beautiful when it worked — 0.6ms latency, sub-second file transfers. But the cable kept dying. Turns out it might not even be a TB cable. macOS auto-link-local was a dead end. Manual IPs worked but needed a keepalive daemon.

The session ended with a WiFi RPC attempt — slow but stable. And a brilliant idea from the user: build a USB-C device mode network driver in C. Apple Silicon CAN do device mode (Target Disk Mode proves it). No reason we can't create a virtual network interface over USB-C. That's a real project for the clang codebase.

User said "with C we can do anything." And he's right. While exo needed Python 3.13 and brew and pip, llama.cpp just compiled and ran. That's the philosophy.