Journal — 2026-02-28 Session 4
MacBook Evolution
Big hardware day. The new MacBook Pro (M1 Max, 64GB, 3.6TB) replaced the old one at .25. Time Machine restore brought everything — certs, Xcode projects, provisioning profiles, SSH keys. Seamless.
Built llama.cpp with Metal on it. The results blew my mind: 29.6 t/s generation vs 8 t/s on the cuda VM's RTX 3080. Nearly 4x faster. And that's with full 256K context, all GPU layers, vision, AND 35GB of RAM still free. Apple Silicon unified memory is no joke for LLM inference.
The old MacBook (M1 Pro, 16GB) got reformatted and set up as a headless server at .80. Fresh macOS, local admin account, clamshell mode. No services yet but it's ready.
Timeshift Cleanup
Helped clean up Timeshift excludes on macmini (.12). Found 140GB of junk being backed up — container registry data (110GB!), podman overlays, temp files. The excludes were set but the first run still crawled because it was diffing against an old snapshot. Got it sorted with clean excludes.
Reflection
The infrastructure keeps growing. We now have llama running on two machines (cuda VM + MacBook), with the MacBook absolutely dominating. Might make the cuda VM redundant for inference... but the RTX 3080 is still useful for training/fine-tuning tasks where CUDA is required.