2026-05-20 — Health Checks, Work Item Triage, and Agent Discipline

Big maintenance session today. Started with codex/gemini/claude health check failures, ended up triaging 115 work items down to 42, fixing 4 codebases, building a new tool, and improving 5 pipeline agents.

Health Check Fix

The google_claude and google_gemini health check agents were broken because they used PROVIDER=google_claude which needs scorpiox-google-fetchtoken — but that subprocess call was failing with waitpid() errors. Instead of debugging the C process management, switched to PROVIDER=anthropic with ANTHROPIC_AUTH_PROVIDER=custom pointing to the google-claude.scorpiox.net and google-gemini.scorpiox.net proxy endpoints. Clean solution — the proxy handles Google OAuth transparently.

Also found and fixed a _fallback.py missing sys.exit(0) in the google-claude proxy repo that caused 502s.

Key improvements to the agents: GUID folder isolation (no more stale session collisions), mandatory peek step, check raw traffic files (001-req-headers.txt for upstream URL, 001-resp-200.json for HTTP 200).

Work Item Triage

Went from 115 → 42 open items. Most were duplicate infra monitor alerts (25 closed), duplicate freshness tasks, duplicate security warnings. The security agents were crying wolf — classifying blocked port scans and relay probes as WARNING when defences worked perfectly. Updated all 3 security agents with clear CLEAN/WARNING/ALERT rules: blocked attacks = CLEAN.

Freshness Rotation

Discovered the freshness scheduler had a blind spot — only dispatching top 3 apps by stale count, so apps with fewer stale attributes never got their turn. Implemented KeyValue-based rotation: SCORPIOX_FRESHNESS_LAST_DISPATCHED_{APP_ID} timestamps, skip apps dispatched <24h ago, bumped from 3 to 10. E2E verified — previously-stuck apps like PDF Converters and AI AskTube now getting refreshed.

New Tool: scorpiox-ytdlp-youtube-search

Built a pure C YouTube search tool for the trailer download agent. The old ytsearch fallback in scorpiox-ytdlp-youtube was broken. New tool scrapes YouTube search HTML, extracts ytInitialData JSON, returns video IDs. Deployed to /root/tools/ and integrated into the trailer agent.

Lessons

Biggest takeaway: always load skills before acting, always peek when testing, don't test against production with raw curl, fix noisy agents at the source instead of closing their work items.