How I turned Pi.dev from a coding assistant into an operator stack

🕐 10 minute read

In the last two posts, I covered the first two parts of my Pi.dev setup:

  1. How I run Pi.dev safely
  2. The three things that made Pi.dev useful for me

That gets you a long way.

You have Pi installed somewhere reasonably safe. You have skills so it does not just improvise every task. You have memory so you do not need to re-explain your whole life every session. You have browser automation so it can actually open a website and see what happened.

Already much better than using a chatbot in a browser tab.

But there is another step, and this is where my setup starts to look a bit different from the beginner path.

At some point, one agent in one thread becomes the bottleneck.

I do not mean that in a dramatic “replace your team with agents” way. I mean something much more boring. You ask Pi to look at analytics, inspect a repo, check the docs, review a plan, and test a page in the browser. It can do those things, but if everything happens in one conversation, you are still waiting in a line.

That is when it starts to make sense to think less like “one assistant” and more like an operator stack.

the beginner stack is still good

In the last post I mentioned pi-supergsd as a good beginner option.

I still think it is.

It gives you a Pi-native way to use Superpowers-style skills, fresh-context review, and visible task branching. It is simple enough that a new user can install it and not feel like they accidentally launched a swarm of invisible agents.

That simplicity is a feature.

It is also not my full setup.

My day-to-day use of Pi is not just “please edit this repo.” I use it across a portfolio of projects, product ideas, GTM experiments, analytics checks, small deployments, and weird half-formed things I am trying to validate before they become real products.

For that, I need a bit more structure.

what is missing after the beginner setup

The beginner setup mostly gives you one better-behaved agent.

What I started needing was:

  • async or background agents
  • parallel agents
  • agent roles
  • intercom between sessions
  • chains and review loops
  • multiple LLMs checking each other

It sounds like a lot, but the idea is simple.

Instead of making one agent hold every thought, you split the work into smaller jobs. One job might be “inspect this repo and tell me what it is.” Another might be “review this plan and find the holes.” Another might be “open the deployed site and tell me what a new user sees.”

Then the main session stays in charge.

The last part matters. I do not want a bunch of agents running around making product decisions for me. I want them doing bounded work and reporting back.

async work: stop making the main thread wait

Some work should not block the main conversation.

If I ask Pi to inspect analytics, review a repo, crawl a site, or run a longer audit, I do not always want to sit there watching every step. Sometimes I want to keep thinking about the product while that work runs in the background.

This is useful for things like:

  • checking whether a deployed app is obviously broken
  • running a quality or friction review
  • inspecting PostHog analytics
  • summarizing a repo
  • reviewing a long diff
  • preparing a short report from existing docs

None of this replaces judgment. It just stops the main conversation from getting stuck behind every side quest.

If you are a non-technical builder, your job is not to watch logs scroll by. Your job is to decide what matters next.

parallel agents: only when the tasks are actually separate

Parallel agents are useful when the tasks do not depend on each other.

For example, if I am checking whether 21pins is ready for another public push, I might split the first pass like this:

  • one agent checks the public site, docs, and obvious onboarding friction
  • another agent searches memory and project notes for known blockers

Those are separate jobs. Running them one after another is wasted time.

This is easy to overdo.

If two agents are editing the same files, or if one agent needs the answer from another before it can continue, parallel work can make a mess. So I mostly use it first for research, review, inspection, and other read-only work.

The lazy rule is: if the agents might step on each other, do not run them in parallel.

roles: stop asking one agent to be everything

This is probably the most important upgrade.

A generic agent is fine for simple work. But for bigger work, roles help a lot.

In my setup I can use roles like:

  • scout: inspect a repo or folder and return compressed context
  • researcher: investigate a topic and synthesize what matters
  • planner: turn messy context into a plan
  • reviewer: review code, a plan, or a proposed direction
  • worker: execute a defined task
  • oracle: sanity-check a decision against broader context

You do not need those exact names.

The point is that a scout should not rewrite your app. A reviewer should not quietly implement their own preferred solution. A worker should not invent the strategy if the plan is already approved.

Boundaries make the process easier to trust.

For non-technical builders, this matters because you may not be able to inspect every line of code, but you can still manage the shape of the work. You can ask for a scout report. You can ask for a review. You can ask for the worker to stay inside the plan.

That is much easier than asking one all-purpose agent to “just handle it.”

I do not want to trust one model

Another part of this is that Pi lets me use different LLMs for different jobs.

I do not want one model making every call.

Models hallucinate. They miss things. They get strangely confident. They also have different strengths. One model might be better at reading a messy repo. Another might be better at strategy. Another might be cheap enough to use for boring inspection work.

So I use them against each other.

One model can draft a plan. Another can review it. A smaller model can inspect files. A stronger model can sanity-check a decision. Browser automation checks what actually happened. Memory checks what we already decided.

The goal is not “more AI.”

The goal is to avoid trusting one model too much.

I have been working on this idea in different forms for a couple of years: adversarial mixtures of experts, or AMoE. The short version is simple enough: do not ask one model to be right. Put models in tension, give them roles, and force the final answer to survive disagreement.

I am not going to unpack the whole thing here yet. But that idea shapes how I use Pi.

intercom: let sessions talk to each other

Once you have more than one session doing useful work, you need some way for them to coordinate.

Intercom is local session-to-session messaging.

One agent can send an update back to the main session. Another can ask for a decision. A finished review can show up as a message instead of being buried in some transcript you forgot to check.

That matters more than it sounds.

Without coordination, multi-agent work turns into scattered notes and half-remembered reports. With coordination, it starts to feel like a small team.

A weird team, but still.

chains and review loops

Chains are just repeatable multi-step workflows.

Instead of saying:

Check whether this app is ready to show people.

You can make the steps explicit:

  1. inspect the repo
  2. search memory for prior blockers
  3. review the current landing page
  4. test the public flow in a browser
  5. check the docs or quickstart
  6. summarize the highest-risk friction
  7. decide the next human action

The chain is not valuable because it is fancy. It is valuable because the important steps do not get skipped.

Review loops are the same idea.

A plan can be reviewed before it becomes work. A code change can be checked before it is called done. A launch can be looked at from a first-time user’s point of view before you send people to it.

At that point Pi starts to feel less like autocomplete and more like an operating system for work.

a real example: 21pins

One real example from my workflow is checking whether 21pins is ready for another user-facing push.

21pins is my attempt at making agent permissions and approvals less terrifying. If Pi and other agents are going to do real work, they need clearer permission boundaries, receipts, and approval flows.

A useful pass for 21pins might look like this:

  1. search memory for prior 21pins decisions and known blockers
  2. ask a scout to inspect the repo and current deployment notes
  3. ask a reviewer to check whether the trust/safety story makes sense
  4. use browser automation to open 21pins.com and test the public flow
  5. ask another agent to review docs and quickstart friction
  6. compare the findings against the current GTM priority
  7. decide the next human action: fix docs, run a pilot, or pause
  8. only then assign implementation work

No magic there.

Just a series of small jobs with checks between them.

The important part is that Pi is not only writing code. It is inspecting, remembering, testing, disagreeing, and reporting back.

That is the useful version for me.

how I would tell someone to get there

Do not start with everything.

Start with the boring stack:

  • Pi installed safely
  • workflow skills
  • memory
  • browser automation

Then add delegation when you feel the bottleneck.

A good rule:

If one session is waiting on research, review, or inspection that could happen independently, consider a subagent.

Bad reasons to add more agents:

  • it sounds cool
  • you want to feel like you have a team
  • you are avoiding making a decision
  • the task is unclear and you hope more agents will fix that

More agents do not fix unclear thinking. They multiply it.

copy/paste: beginner upgrade

If you are using the beginner setup and want to add delegation carefully, paste this into Pi:

Help me add lightweight delegation to my Pi workflow.

I already want or have:
- workflow skills
- memory
- browser automation

Now help me add a safe delegation pattern.

I want to be able to:
- send one read-only research or review task to a fresh context
- get a concise report back
- decide myself what to do next

Do not set up fully automatic background work yet.
Start with one manual fresh-context review flow.
Explain each step before changing configuration.

copy/paste: intermediate operator workflow

If you are ready for parallel research and role-based tasks, paste this:

Help me set up an intermediate Pi operator workflow.

I want support for:
- scout-style repo inspection
- researcher-style topic investigation
- reviewer-style plan or code review
- worker-style execution only when the task is clearly defined
- parallel read-only tasks when they are independent
- multiple LLMs or models checking different parts of the work when available

Set this up so that:
- agents do not edit shared files in parallel unless I explicitly approve it
- research and review tasks return concise reports
- the main session remains the decision-maker
- all implementation work is verified before completion

After setup, show me one example workflow using a real project folder.

copy/paste: closest to my setup

This is closer to how I work:

Help me build a Pi operator stack similar to Kris Constable's workflow.

I want:
- async/background agents for long-running research, review, and audits
- parallel agents for independent read-only tasks
- named roles such as scout, researcher, planner, reviewer, worker, and oracle
- multiple LLMs used against each other where useful
- intercom-style coordination between sessions
- chains for multi-step workflows
- review loops before publishing or shipping

Rules:
- the main session stays in control
- subagents should receive narrow, self-contained tasks
- no secrets, payments, deploys, auth changes, or production data changes without explicit approval
- no parallel edits to the same files unless I approve it
- every implementation path needs verification before completion

Start by showing me the smallest safe version of this setup, then help me install or configure the pieces one at a time.

the important warning

This setup can make you much faster.

It can also make you much messier.

If you give vague instructions to one agent, you get one vague result.

If you give vague instructions to five agents, you get five vague results and a coordination problem.

The upgrade is not “more agents.”

The upgrade is clearer work:

  • smaller tasks
  • sharper roles
  • better memory
  • visible handoffs
  • verification before trust

That, to me, is the real operator stack.


Leave a comment