Larridin blog

Building AI-Native Engineering Teams : From Coding to Verification

Written by Ameya Kanitkar | Jan 12, 2026

There's plenty of advice on how individuals should use AI coding tools. Less clarity on how teams should actually work together differently. 

The Browser Company put it bluntly: "If you don't work Claude Code-native ASAP your team's going to get left behind." Over the last few weeks, something has clearly shifted. Frontier models now reliably generate non-trivial systems, refactors, and migrations—fast enough that the bottleneck is no longer writing code.

The real leverage comes when entire teams change how they work together. This isn’t just a tooling upgrade—it’s an operating model shift. Sprint planning, documentation, testing, reviews, and ownership all need to be redesigned for AI-native engineering teams, not optimized piecemeal for personal workflows. 

Below is the playbook we're now running. It's a synthesis of what I've read, observed, and started implementing with my team. Early days, but the direction feels right.

Change Your Defaults - Mindset Shift

The old default: you write code, you review code, you test it, you ship.

The new default: agents write the code. You build the system that verifies.

Once you internalize this shift, you can work from first principles. What do we actually need to do to make this work?

The New Workflow

1. Your Job Is Now the Spec

The key insight: spend LOT more time on design and implementation planning and LOT less on actual coding

The Superpowers framework (15.6k stars on GitHub) codifies this beautifully. Their brainstorming + write plan skill is essential for designing before implementing.

We've started doing pair sessions—not for coding, but for design and plan/ spec review at the planning stage. Make sure your implementation plan is split into phases, where each phase builds on the last but isn't too big or too small. The incremental nature of this setup is crucial.

Harper actually outlined a lot of this 8 months ago! These tips are still valid and require a “process” change. One recommendation I received is also to be explicit about what definition of “DONE” is at this stage. Having this kind of clarity makes it easier to run agents in a loop later. 

2. Don't Trust a Single Model

At the design and spec stage, we have both Opus 4.5 and GPT 5.2 review the implementation plan. Don't move forward unless both models reach consensus.

This is also where human taste gets injected. For example, in a recent refactor I wanted some code duplication — it would make deprecating old code easier later and keep production paths stable. Stating these constraints explicitly at the planning stage, before any code gets written, is where your judgment shapes the outcome

3. Test-Driven Development Is Non-Negotiable

From the Superpowers documentation: "NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST. If code is written before tests, delete it."

The priority for tests:

  1. End-to-end tests — Playwright tests that actually check things in the browser
  2. Integration tests — Verify components work together
  3. Unit tests — Optional, at your discretion

Write tests that fail first. Then let the AI write code until the tests pass. Tests provide verifiable boundaries for your AI agents. They're not just quality assurance—they're the constraints that make AI-generated code reliable.

I personally write tests that are extremely human-readable. The generated code may be less readable, but tests are the ones that absolutely need to be human-understandable.

4. Increase Constraints, Reduce Errors

One pattern I've noticed: the more constraints you give the AI, the better the output.

For example, use strongly-typed languages. If you're choosing a language, consider Rust—it's extremely tight at compile time, which makes it easier for AI to write code that just works. If you're using Python, use it in an extremely typed sense with high linting standards.

For what it's worth, we are not migrating from Typescript/ Python, but we are embracing a lot more linting.

5. Always Use High-Powered LLM Models - Yes it Can Get Expensive. But it's ok!

I'm finding Opus 4.5 and GPT 5.2 are substantially better for this kind of work. The only reason to think about lower powered models is if it's gating the human loop. But always using most powered models (high thinking / ultra think etc) yields best results.

6. Pre-Commit Verification

Anthropic has plugins for code simplification and review. Make sure these run before any commit. The code checking happens automatically, not as an afterthought. Make it automatic if possible

7. Documentation as Context

Structure your code in a modular way—not too much code in any single module. I aim for a high-depth tree, but when you run ls on any folder, the number of items returned is limited.

Every single thing in the code needs to be documented and designed for an AI to understand. We maintain a rich docs/ folder with plans, designs, and sprint notes. The goal: agents should have not only the current designs but the history of how decisions were made and why the code evolved the way it did.

8. Security

Every commit goes through a security agent audit with hard rules.

What this looks like in practice:

  • Static analysis on every commit. Run tools like Semgrep, CodeQL, or Bandit (for Python) automatically. These catch known vulnerability patterns — SQL injection, hardcoded secrets, insecure deserialization. The AI won't avoid these on its own.
  • Security-specific prompts in your CLAUDE.md. Tell the agent what to avoid: "Never use eval(). Never hardcode credentials. Always parameterize database queries." Constraints in context reduce (but don't eliminate) bad patterns.
  • Human review at design/ planning phase with a security lens. Someone on the team needs to look at AI-generated code specifically asking: "What could go wrong here?" This is different from a normal code review. You're not checking if it works — you're checking if it fails dangerously.
  • Dependency auditing. AI will happily add packages to solve problems. Run npm audit, pip-audit, or Snyk to catch known vulnerabilities in whatever it pulled in. Better yet, maintain an approved package list. 

 

Context Engineering

Lance Martin's work on "Effective Agent Design" introduces what might be the most important concept in AI-native development: context engineering. Every engineer should deeply understand intuition around this.

The key principles:

  • Give agents a computer — filesystem and shell access are fundamental
  • Multi-layer action space — use a small number of atomic tools, push complex actions to bash/code
  • Progressive disclosure — don't load all tool definitions upfront; let agents discover what they need
  • Offload context — write results to files, summarize when needed
  • Cache aggressively — Manus calls "cache hit rate" the most important metric for production agents
  • Isolate context — use subagents for parallelizable tasks and long-running work

The Ralph Wiggum Pattern

For very long-running tasks, Geoffrey Huntley describes a pattern: run agents repeatedly until a plan is satisfied. Context lives in files, progress is communicated via git history. Use stop hooks to verify work after each iteration.

Tooling and Infrastructure

Tooling is needed for safe agent execution. One good and important mental model here is to make sure that your agents have a “harness” where they can drop in and access all the things to get the job done. Good sandboxing allows autonomous action without constant permission requests, preventing flow disruption.

Dockerize Everything

We are dockerizing dev environments. The goal is to push local changes to a VM, letting agents work asynchronously (e.g., while you sleep). Simple docker-compose setups are essential for this agent work.

 

Automation

Automation tooling offers significant value (e.g., coding entire flows). Clean CI/CD and full environments are vital for automation and security.

These are things you should be doing anyway - but this is much higher priority now than before. 

Anti-Patterns to Avoid

From the Superpowers repository's "Red Flags":

  • Never skip TDD — tests written after code pass immediately and prove nothing
  • Never proceed with unfixed review issues
  • Never dispatch multiple implementation subagents in parallel — they'll create conflicts
  • Never let the agent read plan files — provide full text instead
  • Never accept "close enough" on spec compliance
  • Never rationalize "just this once" for skipping process

Team Structure

  • Small in-office teams tend to work better. This area is changing literally daily. Quick feedback loops and shared learnings can scale much faster with in person “tight” teams. 
  • Create "Producer" roles — someone to coordinate how AI-native teams run
  • Treat teammates like artists — get them into flow, keep them in flow, help ideas ship

A recent Browser Company’s  x post is full of insights on how this changes team composition.

The teams winning with AI coding aren't just using the tools—they're restructuring their entire workflow around them. The goal isn't to make engineers 10x faster. It's to make your entire team capable of shipping code.

Sources & References

Frameworks & Tools

Talks & Essays

People & X Accounts

Acknowledgments

Thanks to Justin McCarthy for sharing insights on running AI-native teams, Sam Schillace for his experience with Amplifier, Jesse Vincent for building Superpowers, and James Cham for always connecting interesting minds.