2B Task Orchestrator — Juozas Žilys

Problem

Autonomous coding agents are impressive when they work and frustrating when they don’t. A single-model agent has to reason, plan, execute, self-check, and stay under rate limits — all in one context. When it breaks, it breaks opaquely.

I wanted a desktop app that treats agent work the way a real dev team does: different roles for different costs, work surfaced like a sprint board, and a persistent session that survives restarts.

What I built

An Electron app with a React + Zustand UI, a Claude-CLI-driven orchestrator in the main process, and a CLI for headless runs. A REST API surface on top for external monitoring.

Tiered specialists. The orchestrator hands each task to the right model for the cost — Haiku for the cheap stuff, Sonnet for the middle, Opus for the hard calls. Separate validator / verifier / committer agents mean the model doing the work isn’t the model grading it.
Persistent orchestrator session. The first task in a run starts a new Claude session; every subsequent task continues it. Session state survives app restarts and rate-limit waits, so a long run picks back up instead of re-priming context from scratch.
Live execution view. Streaming JSON output is parsed as it arrives and rendered as a living task card — status tags, cost tracking per call, rate-limit countdowns, errors inline.
Sprint board, standup, retrospective. The TODO list becomes a sprint; finished tasks become standup notes; a retrospective view summarizes what the agent did and what it couldn’t.

Interesting bits

Stream processing is the whole UX. The visible difference between “good agent” and “bad agent” isn’t model quality — it’s whether the user can see what it’s doing. JSON streaming mode + incremental parsing turns a wall of spinning text into a progress indicator that actually indicates progress.
Rate-limit handling as a first-class state. When the API says “wait until T,” the app schedules a resume instead of exiting. Tasks in flight are checkpointed; the UI shows a countdown instead of an error.
Specialization isn’t just cost optimization. A validator that wasn’t in the room when the code was written gives genuinely different feedback than a self-critique pass. Cheap enough to run on every task.
The NieR theme isn’t decoration. It’s the reason I kept building it long enough for it to become useful. Portfolios are allowed to reward their author.