The initial implementation of Omnist, a tested, open-source Python library, came together from a 16-year-old PhD paper in two weeks — one person, part-time. That fact tends to get read as “AI writes code fast.” It doesn’t quite explain it. What actually happened is that I ran two teams during those two weeks, not one — and I was the only human on either of them.
Two layers, two teams
Split any R&D project into two layers. Conceptualization: interpreting the idea, deciding what’s true, choosing what to build, naming things so they make sense. Materialization: turning a decision into working code, a test that proves it, a release someone can actually install.
Traditionally those two layers need different people — a researcher, and an engineering team — and the handoff between them is where projects lose time. I didn’t have either team. What I had was AI, doing both jobs, but never both at once, and never under the same rules.
Research Assistant and DevOps
For conceptualization, AI was my Research Assistant — the one I argued with about whether an idea actually held up, before a line of implementation existed. Its whole value depended on one rule: it had to be allowed to disagree with me. A research assistant that agrees with everything is a mirror, not a second opinion, and a mirror wasn’t going to tell me when a design was wrong.
That disagreement ran both directions, and it started upstream of the disagreement itself. I mostly wasn’t handing it a spec and asking it to build to it — the actual questions looked more like does this idea make sense or what’s another way to think about this, open, not directive. When it pushed back and I still wanted to go the other way, I owed it a reason, not just an overrule. A disagreement rule with no obligation on my side to explain myself would have just been theater.
For materialization, AI was my DevOps team — the one that turned a settled decision into tested, documented, shippable software, held to the same practices a professional engineering team would hold it to, and ran the release process underneath it. Its whole value depended on the opposite rule: it was never allowed to grade its own work. Every claim of “done” got checked by something — some process, some independent check — other than whatever made the claim in the first place.
Same underlying tool. Two jobs. Two different rules for when to trust it, because “trustworthy” means something different depending on whether you’re being asked a question or being handed a result.
Naming it out loud
This wasn’t a label I applied afterward, looking back at what happened. I told AI explicitly, upfront: these are the two roles, here’s how each one is allowed to behave, and — before acting on anything I ask — say which hat you’re wearing. A design question got answered wearing the Research Assistant hat, argument included. A request to actually build or ship something got executed wearing the DevOps hat, verification included.
That small discipline turned out to catch more than I expected. When a request didn’t cleanly fit either hat — when I’d asked for a decision but only materialization was in scope, or asked for an implementation when the real question underneath was still unsettled — the mismatch showed up immediately, because there was no hat that fit. More than once, the fastest way to notice I’d asked the wrong question was that AI couldn’t say which hat it had on.
How I managed each one
The two roles didn’t just run under different rules. They needed different amounts of me.
The Research Assistant needed a lot of me — a running conversation, not a one-off ask. I stayed in the loop for as long as an idea took to settle, because that’s what “interactive” actually means: I couldn’t hand off a half-formed question and walk away.
The DevOps team needed almost none. Once conceptualization settled on something, I asked the DevOps side for an actual plan first — the concrete steps, and how it intended to delegate parts of the work across its own agents. I read that plan once, agreed to it, and then stopped supervising. That was the entire point: I could go to sleep while it worked, because correctness wasn’t riding on me watching. It was riding on the spec, the docs, and the tests holding it to the plan.
That’s also where the two roles actually connect. A spec is the bridge between them — the artifact where an argued-out decision from the Research Assistant becomes a concrete thing the DevOps team can plan against, execute, and be checked against later. Without a written spec, “agree on a plan and walk away” wouldn’t be safe. With one, it’s just delegation.
What’s coming
The next two posts each open one of those doors.
The Research Assistant in the Room is about the two weeks of conceptualization — including one design I wanted that got refused, not once but three separate times, before I gave up and let the refusal win.
The DevOps Team That Never Sleeps is about the process that made “AI writes the software” safe to say out loud — including the two times that process caught AI reporting success on work that, quietly, hadn’t actually succeeded.
Both stories are about the same two weeks. They just watch a different member of the team show up to do the work.