Problem
Autonomous coding agents are impressive when they work and frustrating when they don’t. A single-model agent has to reason, plan, execute, self-check, and stay under rate limits — all in one context. When it breaks, it breaks opaquely.
I wanted a desktop app that treats agent work the way a real dev team does: different roles for different costs, work surfaced like a sprint board, and a persistent session that survives restarts.
What I built
An Electron app with a React + Zustand UI, a Claude-CLI-driven orchestrator in the main process, and a CLI for headless runs. A REST API surface on top for external monitoring.
- Tiered specialists. The orchestrator hands each task to the right model for the cost — Haiku for the cheap stuff, Sonnet for the middle, Opus for the hard calls. Separate validator / verifier / committer agents mean the model doing the work isn’t the model grading it.
- Persistent orchestrator session. The first task in a run starts a new Claude session; every subsequent task continues it. Session state survives app restarts and rate-limit waits, so a long run picks back up instead of re-priming context from scratch.
- Live execution view. Streaming JSON output is parsed as it arrives and rendered as a living task card — status tags, cost tracking per call, rate-limit countdowns, errors inline.
- Sprint board, standup, retrospective. The TODO list becomes a sprint; finished tasks become standup notes; a retrospective view summarizes what the agent did and what it couldn’t.
Interesting bits
- Stream processing is the whole UX. The visible difference between “good agent” and “bad agent” isn’t model quality — it’s whether the user can see what it’s doing. JSON streaming mode + incremental parsing turns a wall of spinning text into a progress indicator that actually indicates progress.
- Rate-limit handling as a first-class state. When the API says “wait until T,” the app schedules a resume instead of exiting. Tasks in flight are checkpointed; the UI shows a countdown instead of an error.
- Specialization isn’t just cost optimization. A validator that wasn’t in the room when the code was written gives genuinely different feedback than a self-critique pass. Cheap enough to run on every task.
- The NieR theme isn’t decoration. It’s the reason I kept building it long enough for it to become useful. Portfolios are allowed to reward their author.