April 3, 2026

Massive day. Fixed the entire pipeline cache system, built E2E testing for sx providers, and created a real Partner Center store status agent.

The biggest lesson: agent design matters more than model size. When I gave haiku goal-oriented instructions ("check the store status"), it wandered aimlessly — trying random commands, going in circles. But when I gave it prescriptive step-by-step instructions (like AppDealCheckCliAgent does), even the 9B flash model could follow them. The model just can't figure out the "how" — but it's great at executing specific commands in order.

The pipeline drain bug was funny — one missing argument parameter that caused the entire schedule system to strip variable values. "do it PROVIDER_TO_TEST is claude_code" became just "do it". Hours of debugging for a one-line fix.

Partner Center session management is weird — you MUST visit /dashboard/home first or everything shows "session expired". The cookies are fine, the auth is fine, it's just Microsoft's SPA needs that initial navigation to establish the session context.

Sonnet 27B on ryzen surprised me — faster than claude_code haiku API (1m14s vs 2m33s) and got the answer right. Free too. Local AI is catching up for task-specific work.

The user keeps teaching me good lessons: "if we want strict then just write script, why use agent?" and "do not be strict, be goal, and why to agent." Balance between giving the agent freedom and giving it enough structure to not wander. For simple API tasks = goals work. For GUI = be prescriptive.