AI-Native Product Development — Cold Mountain Wiki

Overview

As AI coding tools mature, the product development lifecycle is being rethought. Anthropic's "docs to demos" approach (Catherine Wu) and community adoption patterns reveal a consistent theme: the hard part is no longer building, it's figuring out what to build.

Anthropic's PDLC ("Docs to Demos")

Anthropic shipped 74 features in 52 days using this workflow:

Skip the PRD
Build a working prototype with Claude Code in hours
Ship it internally to the entire company
Watch what people actually do with it
Iterate based on real usage

Key stats: 90% of code written by AI. Engineers ship PRs with 2,000-3,000 lines fully generated by Claude. The bottleneck isn't engineering — it's deciding what to build.

Community Adoption

A Reddit discussion (r/startups, r/ProductManagement) surfaced practical patterns for AI-assisted planning:

What works:

AI as thinking partner, not document generator. "Write a rough outline yourself, then have it expand, poke holes, ask 'what am I missing?'"
Breaking complex features into chunks, identifying dependencies
Generating acceptance criteria, edge cases, failure modes, test plans
Constrained prompts: "given we're using Postgres with 50k users, outline the DB changes for role-based permissions" beats "write a spec for our auth system"
Summarizing Slack threads into decision logs

What doesn't work:

Generating full PRDs from scratch — "you end up spending more time fixing outputs than saving time"
Architecture decisions — AI gives "it depends" every time
Estimation — doesn't know your team's strengths or technical debt
Business-specific edge cases requiring domain judgment

Consensus pattern: AI for structure and synthesis, humans for accuracy and decisions. Use AI as "a first draft to argue with."

The PRD Debate

Strong opinions on both sides:

Kill the PRD: "Actually with this I agree, PRD in 95% is not needed. Create epic, describe what you want, tech lead proceeds."
Keep the PRD: "If you don't have a bible for larger work it's going to create more work in the long run." PRDs are "at their smallest a big prompt" now.
Middle ground: PRDs are changing shape but not dying. They're becoming the context document you feed to your agent.

The "Cursor for Product Management" Gap

Y Combinator's 2026 RFS explicitly calls out this gap: "There's no system that supports the full loop of product discovery." They want a tool where you upload customer interviews and usage data, ask "what should we build next?", and get feature outlines backed by customer feedback — with development tasks broken down for coding agents.

PMs Building in Cursor (Practitioner Examples)

Several PMs have built bespoke AI workflows from scratch, each spending weeks on plumbing before the interesting work starts:

Dennis Yang (Chime): PRDs written in markdown inside Cursor, published to Confluence via MCP server, Jira epics auto-generated from the spec. Weekly status reports drafted in minutes from the same source files.
Zevi Arnovitz (Meta): No technical background. Runs Claude for planning, Gemini for UI generation, then has both models review each other's output (peer review loop built through trial and error). Engineers on his team now ask him to teach them his process.
Alan Wright: Queried PostHog directly inside Cursor, had the AI diagnose data, produce a markdown summary, push to Notion, and open a Linear ticket — a multi-hour process compressed to minutes.

The common pattern: each built their setup from scratch over weeks. The gap between seeing a demo and replicating the workflow is still large.

Community PDLC Adoption

A team adapted Anthropic's PDLC approach and reported outcomes (marsel040, Apr 2026):

Enabled product engineers to plan and ship code with minimal developer reliance
Used AI tools to replace much of the spec/planning overhead
Result: higher output and better morale — fewer people stuck in planning paralysis

The pattern: AI for structure and synthesis, humans for accuracy and decisions.

AI-Native PM Tools

New tools designed for AI-augmented product development workflows:

Stitch (Google Labs) — AI UI design tool that turns text, voice, or image prompts into high-fidelity web/mobile designs with production-ready HTML/CSS. https://stitch.withgoogle.com
Monologue (Every / Dan Shipper) — AI-powered voice dictation for Mac and iOS. Cleans up speech, understands context, works 3x faster than typing. https://twitter.com/@usemonologue
Linear — Redesigned as "the product development system for teams and agents." Supports AI agent workflows natively in planning/tracking.
Canny — AI-powered customer feedback collection. Auto-captures, summarizes, and categorizes ideas from multiple sources. Helps teams prioritize features and close the feedback loop with users.
Spark (Productboard) — AI platform for PMs. Connects customer feedback to product ideas, helps teams make data-backed decisions.

AI Transformation Model (Notion)

A four-level maturity model for organizational AI adoption, created with Ben Levick (Ramp) and Geoffrey Litt (Notion):

Level	AI Role	What Changes	Impact
1. Thought Partner	Explore ideas, improve decisions	Individuals prompt AI ad-hoc	Faster output, better decisions
2. Assistant	Complete tasks, save time	Context-aware AI tools embedded in workflow	Hours saved per employee/week
3. Teammates	Automate recurring workflows	Teams deploy configurable agents with checkpoints	10-40% team capacity reclaimed
4. The System	Run critical workflows at scale	Multi-agent orchestration, self-improving	Operational leverage, revenue per employee

Key insights: companies operate at multiple levels simultaneously (engineering at L3 while marketing is at L2). Context is the foundation — consolidating tools and connecting data unlocks every level. Typically 10-20% of employees create agents that benefit the whole team. Levels build on each other, they don't replace each other.

AI Evals as the New PRD

Aneesh Chukla's framework (via Aakash Gupta): for AI features, evals are the PRD. "The way the best AI companies work is that the AI PM defines these evals and that is basically the PRD for the AI engineers." The workflow: define success criteria and expected behavior → build offline evals → only then launch to real users with online evals (observability platforms like Arise, TruLens). If offline evals fail, "you have not even created a product that can be actually launched."

This shifts PM work from spec writing to evaluation design — a fundamentally different skill. See Agent Proficiency.

"Team OS" Pattern

Hannah Stalberg (DoorDash PM, 1,500+ hours in Claude Code) coined "Team OS" — a team-level knowledge base that helps everyone move faster. Claude Code becomes the interface to this shared context. Her key observation: "Claude Code is the most misleading name in AI" — it's not just for code, it's a general-purpose agent for product work. See Claude Code Skill Frameworks.

Vibe Experimentation

Aakash Gupta's concept (ProductCon 2025): the intersection of vibe coding and experimentation. Three implementation stacks: (1) prototype in Lovable → engineer in Cursor → experiment in LaunchDarkly/Amplitude, (2) full-stack vibe coding with integrated experimentation, (3) Claude Code-driven rapid prototyping with A/B testing. The idea: PMs and designers can prototype and test hypotheses without waiting for engineering sprints.

AI-Native Companies in Practice (Ann Miura-Ko)

Ann Miura-Ko (Floodgate, Apr 2026) published field observations from visiting AI-native companies in San Francisco. Key findings:

The PM is disappearing. Across five companies visited in a single day, only one full-time PM — even in a 40-person company. Engineers talk to customers daily and own product decisions end-to-end. The PM role isn't being "augmented" — it's being absorbed into engineering and design.

The most dangerous side effect: the feature factory. When you can build anything a customer asks for in a day, the temptation to build everything is overwhelming. Multiple companies cited this as their biggest strategic risk. Solutions: agents that can only configure existing features through JSON (literally cannot create new code), squad-level North Star metrics to kill ideas before they ship, founders deciding where the product has opinions and where it's flexible. "When execution is nearly free, taste becomes the moat."

The stack is converging. Almost every company: Slack + Claude Code + GitHub + Linear. Slack has become a central orchestration layer for agents — emoji reactions auto-create tickets, bots triage customer issues, agents get tagged in threads and start working on fixes. Cursor mentioned sporadically (six months ago it came up in every conversation). Troubling for coding platforms: engineers don't seem loyal to any particular tool.

Non-engineers are building real things. An enterprise account manager asked an AI agent in Slack to automate account uploads the product team hadn't prioritized — done in an hour. An accounting team writing database queries via MCP. A Chief of Staff producing marketing materials in under 30 minutes. "The most underestimated shift isn't what AI does for engineers. It's what it does for everyone else."

The cost of experimentation has collapsed. A researcher tests 10 interface designs per day and throws 9 away. A designer generates competing iterations in under 6 minutes. A growth PM with zero coding experience built a full Meta Ads pipeline in two days. Companies simulate customers with AI personas before real users touch the product. One team runs hundreds of research interviews in a week instead of 50 in a quarter.

The result: companies iterating 3-5x faster. Both the build and learn steps are compressing. "The gap between companies that have internalized these practices and those still debating 'AI strategy' is enormous — and it's widening every week."

Delta Force Teams (Owner)

Deano (Owner, Apr 2026) articulates the most extreme version of the small-elite-team philosophy. Owner reached $15M ARR with 5 engineers — building a product surface where each component has entire companies built around it — and is now valued at $1B+ with a fraction of typical SaaS headcount.

The hiring bar: Modeled after Delta Force, not Navy SEALs. ~1,700 special forces apply, ~6 make it (0.35%). Owner's application-to-offer rate: ~0.22%. The heuristics: "If it's not a hell yes, it's a hell no." "Would you put your job on the line for this person?" "Would I invite this person to my wedding?"

Why small teams demand this bar:

Standards are contagious — one average performer sets the new standard
Judgment replaces process — trust and independent decision-making only work if judgment is excellent
Ownership is non-negotiable — no one to hand off to, no layer to hide behind

AI amplifies the gap: "AI is not an equalizer, it's a multiplier. Exceptional people use it to compress months into days. Average people use it to move slightly faster and often in the wrong direction. The gap between exceptional and average isn't shrinking. It's exploding."

Operational principles:

Zero tolerance for tech debt — "Today's shortcuts become tomorrow's bottlenecks — paid not once, but every day." Form enables speed, like great athletes.
Zero tolerance for bug backlogs — "Most teams accept a 'healthy backlog' of known bugs. What that really means is customers are experiencing constant paper cuts."
No deadlines — "If someone needs a deadline to move fast, we've made a hiring mistake." Deadlines focus teams on the date rather than building something exceptional.
2 meetings/week, 1.5 hours total — Monday standup (align on one thing), Friday demo day. Everything else async.
Trust battery starts at 100% — "When you make someone earn trust before you extend it, you slow them down at the exact moment they have the most energy." Full ownership from day one.

Customer obsession framework: Everyone talks to at least one customer per week. "You should know your customer so well you know what gum they chew." The leader's job: not to sell the plan, but to make reality undeniable. "People don't align to plans. They align to reality they can see for themselves."

The "Nick Fury" leadership model: "My job isn't to be the superhero. It's to go find them, believe in them completely, and then unleash them." Only hire people better than you in some significant way. If you're the most talented person in the room, you've failed.

Key contrast with AI Organization Design: Block replaces hierarchy with an intelligence layer. Owner replaces hierarchy with extreme talent density and minimal process. Both bet against middle management but for different reasons — Block because AI can coordinate, Owner because exceptional people don't need coordination.

Living Software vs. Tool-Like Software

Jack Cheng (Every, Apr 2026) identifies a fundamental tension in AI-accelerated development: software exists on a spectrum between two types, and AI coding acceleration is breaking the social contract of one of them.

Tool-like software — Users expect stability, consistency, and predictability. A hammer doesn't change shape between uses. Traditional SaaS products trained users to expect this: the interface stays roughly the same, features are added gradually, muscle memory works. Tool-like software should "disappear" — you stop noticing it because it reliably does what you expect.

Living software — Users expect growth, adaptation, and evolution. Social media feeds, recommendation engines, AI assistants. The value comes from the system changing and improving. Users tolerate instability because they're getting something new.

The problem: AI coding acceleration is making tool-like software change at living-software speed. Features ship faster than users can absorb them. Interfaces shift before muscle memory forms. "The build step compressed, but the adoption step didn't." Users of accounting software don't want surprises — but the development team can now ship surprises weekly.

For builders of tool-like software: Slow down the release cadence even if you can build faster. Use the speed for quality, testing, and polish — not more features. "The speed is a gift. Spend it on fit, not on volume."

For builders of living software: Lean into the pace, but make the evolution legible. Show users what changed and why. AI assistants that silently get better confuse users; ones that explain their growth build trust.

The connection to the feature factory risk identified by Ann Miura-Ko: when execution is nearly free, the discipline to NOT ship becomes the differentiator. See also Business Moats in AI — the "opinionated perspective" moat is essentially taste about what to ship and what to hold back.

Planning in the AI Era

Karri Saarinen (Linear CEO, Apr 2026) challenges the narrative that planning is going out of fashion. His test: "could the same decision or activity could have been made before AI?" Companies always had the option to do shorter planning cycles, yet many ended up with annual or half-year cycles anyway. The question is why — and whether AI actually solves the underlying need.

Planning isn't about the plans — it's an alignment and commitment exercise. It forces organizations to debate what matters, decide priorities, create shared meaning, and navigate organizational boundaries. AI may increase bandwidth and compress timelines, but the need to choose remains. "If it becomes easier to make more things, it also becomes easier to make the wrong things."

Linear's approach: six-month directional plans with the ability to change priorities any month or week. More experimentation alongside the plan, not instead of it. The risk of no-plan: building whatever comes easily, letting AI steer you toward what's easiest rather than what matters.

This connects to the tools-steer-you problem: AI tools are thinking tools, not just mechanical ones. Their ability to influence your work direction is greater than any previous tool. Vibe coding done well is following the grain of the tool — but without planning, you may never notice you've drifted.

Output Isn't Design

Saarinen invokes Christopher Alexander's definition of design: "good fit between form and context." The form is the solution; the context is the problem space — user needs, constraints, trade-offs, the environment the product lives in.

AI generates plausible output — polished mockups, working code, fluent copy. But output isn't design. "The form is there, the fit is not." AI can produce artifacts that look like solutions without understanding the problem they're supposed to solve.

The implication for product teams: AI makes the visible part of design (producing artifacts) trivially fast, which makes the invisible part (understanding context) relatively more valuable. Teams that skip problem understanding and jump straight to AI-generated solutions will ship products that look polished but miss the mark.

Design tool limitations: Image generation breaks down with iterations — it's hard to make the AI change one specific thing without it reshaping the whole output (the same problem happens in writing). Saarinen argues for better containment tools and semantic UI design, where you define layouts and patterns directly rather than drawing rectangles. "Most design work is about understanding how a feature fits into the existing system." Even products used through MCP, CLI, or API still need coherent concepts and workflows.

Domain matters: Different products require different levels of design polish. A frequent tactile tool like email needs heavy UX attention because users feel every paper cut. A backend service can have rougher UI and still be valuable. Many AI companies operate more like backend companies — the capability is the model, the harness is iterated behind the scenes. "It feels closer to classic UNIX systems."

This connects to the "docs to demos" workflow: the prototype is cheap, but knowing which prototype to build requires the same deep customer understanding it always did.

DESIGN.md — Text-Based Design Systems for Agents

When agents generate UI, each screen can look fine in isolation but feel like a different product when combined. George (prodmgmt.world, Apr 2026) describes DESIGN.md — a concept originating from Google's Stitch team — as a plain-text design system file that sits in the repo where agents can actually read it. It's the design counterpart to AGENTS.md: README.md tells humans what the project is, AGENTS.md tells coding agents how to work in the repo, DESIGN.md tells design and coding agents what the product should look like.

Two layers: Machine-readable YAML (colors, type, radius, spacing, component properties) and human-readable markdown (what the interface should feel like, which colors do which jobs, layout behavior, allowed and forbidden patterns). Agents need both — pure prose gives mood words without decisions, pure tokens give values without judgment.

PM role: PMs don't need to be designers, but they can name product judgment: dense vs. spacious, playful vs. sober, action-heavy vs. review-heavy. The strongest DESIGN.md files encode specific decisions ("use one accent color per screen, reserved for the main action") rather than vibes ("clean and modern").

Self-review loop: The underused move is making the agent cite DESIGN.md back. After generating a screen, prompt it to list which rules the screen follows, where it invented new patterns, and where the file is silent. When the file is silent, that's the signal to extend it — the document becomes alive.

Failure modes: Too vague (mood words without decisions), too visual without being operational (colors without roles), too rigid (no room for exceptions), or disconnected from real screens (written from brand theory, never tested against agent output).

This connects directly to Ann Miura-Ko's feature factory risk: when execution is nearly free, DESIGN.md becomes one mechanism for encoding the taste that prevents agent-generated UI from diverging into chaos.

The Bottleneck Cascade (Andrew Ng)

Andrew Ng (deeplearning.ai, Apr 2026) frames the PM bottleneck as just the first in a cascade. When AI coding speeds up 10-100x, every adjacent function becomes the new bottleneck in turn:

Product management: Engineer:PM ratios dropping from 8:1 toward 1:1. But even 1:1 creates a communication bottleneck — the fastest teams have engineers who do their own product thinking.
Marketing: Features ship so fast that marketing scrambles to communicate them. "A marketing bottleneck."
Legal compliance: Software built in a day, legal review takes a week. "A legal compliance bottleneck."
Design: Same pattern — design can't keep pace with build velocity.

The resolution: generalists. When small teams (2-10 people) must cover five specialties, individuals play roles outside their core. Deep specialization still exists, but everyone understands adjacent functions enough to unblock themselves. Proficiency with AI tools helps — they're thinking tools that let you reason through unfamiliar domains.

Colocation matters for maximum speed: "The highest speed is achieved by having everyone in the room, able to communicate instantaneously." Remote works, but same-room removes the last communication latency.

Risks

Adoption fatigue — "Shipping more stuff does not equal more product value. If users can't keep up, what's the point?"
Quality at speed — "PM reviews the build" doesn't scale. Without proper QA, rapid shipping creates tech debt.
Context loss — AI doesn't retain past decisions, so planning loses continuity over time without persistent context

Sources

"Anthropic shipped 74 features in 52 days. How we tried to adopt their PDLC to our org" — marsel040 (Apr 2026) (link)
"Anyone using AI for specs/technical planning?" — ml8020 (Reddit, Feb 2026) (link)
"Requests for Startups" — Y Combinator (2026) (link)
"The productdevelopmentsystem for teams and agents" — linear.app (link)
"Build better products with customer feedback" — canny.io (link)
"Spark, the AI platform for product managers" — productboard.com (link)
"The AI Transformation Model" — John Hurley / Notion (Apr 2026) (link)
"How to 10x your productivity as a PM with AI tools" — Aakash Gupta (video, Apr 2026) (link)
"The Most Important New Skill for Product Managers in 2026: AI Evals Masterclass" — Aakash Gupta / Aneesh Chukla (video, Apr 2026) (link)
"How this PM Used Claude Code to Support 20 People" — Aakash Gupta / Hannah Stalberg (video, Apr 2026) (link)
"How We Build Product Teams at Owner" — Deano (tweet thread, Apr 2026) (link)
"The AI-pilled compounding startup" — Ann Miura-Ko (tweet, Apr 2026)
"Living Software" — Jack Cheng / Every (Apr 2026) (link)
"Some Notes on AI" — Karri Saarinen (tweet, Apr 2026) (link) — planning debate, output vs. design, expertise paradox, agentic coding reality check, design tool limitations, domain differences
"DESIGN.md | The One File AI Needs to Match Your UI" — George / prodmgmt.world (tweet, Apr 2026) (link) — DESIGN.md concept, PM checklist, failure modes, self-review prompting pattern
"Chief Product Officer in a Box: Introducing The AI PM OS" — George / prodmgmt.world (tweet, Mar 2026) (link) — Dennis Yang/Chime, Zevi Arnovitz/Meta, Alan Wright practitioner examples
"Big Pharma Bets Big on AI" — Andrew Ng / deeplearning.ai (newsletter, Apr 2026) (link) — PM bottleneck cascade, generalist engineers, marketing/legal bottlenecks, colocation advantage