Skip to content
Juozas Žilys
← All projects

2B Task Orchestrator

A desktop app that runs a multi-agent LLM orchestrator on a real codebase — tiered specialists, persistent sessions, live execution view.

Role
Solo build
Period
2024 – present
Repo
github

Electron 35 · React 18 · Zustand · TypeScript · Tailwind · Framer Motion · Claude CLI

Problem

Autonomous coding agents are impressive when they work and frustrating when they don’t. A single-model agent has to reason, plan, execute, self-check, and stay under rate limits — all in one context. When it breaks, it breaks opaquely.

I wanted a desktop app that treats agent work the way a real dev team does: different roles for different costs, work surfaced like a sprint board, and a persistent session that survives restarts.

What I built

An Electron app with a React + Zustand UI, a Claude-CLI-driven orchestrator in the main process, and a CLI for headless runs. A REST API surface on top for external monitoring.

  • Tiered specialists. The orchestrator hands each task to the right model for the cost — Haiku for the cheap stuff, Sonnet for the middle, Opus for the hard calls. Separate validator / verifier / committer agents mean the model doing the work isn’t the model grading it.
  • Persistent orchestrator session. The first task in a run starts a new Claude session; every subsequent task continues it. Session state survives app restarts and rate-limit waits, so a long run picks back up instead of re-priming context from scratch.
  • Live execution view. Streaming JSON output is parsed as it arrives and rendered as a living task card — status tags, cost tracking per call, rate-limit countdowns, errors inline.
  • Sprint board, standup, retrospective. The TODO list becomes a sprint; finished tasks become standup notes; a retrospective view summarizes what the agent did and what it couldn’t.

Interesting bits

  • Stream processing is the whole UX. The visible difference between “good agent” and “bad agent” isn’t model quality — it’s whether the user can see what it’s doing. JSON streaming mode + incremental parsing turns a wall of spinning text into a progress indicator that actually indicates progress.
  • Rate-limit handling as a first-class state. When the API says “wait until T,” the app schedules a resume instead of exiting. Tasks in flight are checkpointed; the UI shows a countdown instead of an error.
  • Specialization isn’t just cost optimization. A validator that wasn’t in the room when the code was written gives genuinely different feedback than a self-critique pass. Cheap enough to run on every task.
  • The NieR theme isn’t decoration. It’s the reason I kept building it long enough for it to become useful. Portfolios are allowed to reward their author.