Overview
As AI coding tools mature, the product development lifecycle is being rethought. Anthropic's "docs to demos" approach (Catherine Wu) and community adoption patterns reveal a consistent theme: the hard part is no longer building, it's figuring out what to build.
Anthropic's PDLC ("Docs to Demos")
Anthropic shipped 74 features in 52 days using this workflow:
- Skip the PRD
- Build a working prototype with Claude Code in hours
- Ship it internally to the entire company
- Watch what people actually do with it
- Iterate based on real usage
Key stats: 90% of code written by AI. Engineers ship PRs with 2,000-3,000 lines fully generated by Claude. The bottleneck isn't engineering — it's deciding what to build.
From Tool to Teammate (Mike Krieger)
Mike Krieger (Anthropic, head of Labs, co-founder of Instagram) describes the next evolution beyond "docs to demos": treating the model as a long-horizon teammate rather than an interactive assistant. The shift came with Fable-class models that can sustain multi-hour autonomous sessions and maintain system-level awareness across a codebase.
The delegation model. Krieger routinely sets up complex tasks before going to sleep and wakes up to completed work. The model handles obstacles autonomously — if a remote service goes down mid-task, it scaffolds a temporary backend, documents the workaround, and continues. This demands a new skill: decomposing intent upfront rather than iterating turn-by-turn. Architectural planning conversations with the model — including generating diagrams and shareable HTML summaries for team alignment — now precede execution.
Concurrent sessions. The natural working pattern becomes running 5–6 parallel Claude sessions on different tasks, or one long-running session that forks sub-agents for background work while keeping the main thread responsive. The choice between these modes is still personal and situational.
Verification loops. The core practice that makes delegation trustworthy:
- Every PR includes attached screenshots or video of the actual UI state — not just passing tests, but visual proof of real flows with real data
- Video captures are an underexplored tool: giving Claude screen recordings lets it spot animation jank and timing issues that screenshots miss
- Regression testing via "expressed workflows in text" — ideal user flows written as text specs that Claude repeatedly checks against
- Mock backends built by the model itself, kept in sync with the real upstream as code evolves — something that felt impractical before but now "Claude will read the changes and adapt the thing"
- After verification, engineers have follow-up conversations: "Can you make sure I deeply understand all the trade-offs you made?" The model produces whatever artifacts are needed to make its decisions comprehensible
Dynamic workflows. For large-scope tasks (e.g., porting a complex Python codebase to TypeScript over a weekend), Krieger uses workflow orchestration: Claude designs a multi-step plan (deep analysis → spec creation → module-by-module translation → incremental testing → adversarial review → gap check), expresses it in code for human review, then executes it autonomously. Sub-tasks within a workflow can be tuned to different effort levels — not every step needs maximum thinking depth.
The DRI model persists. Despite many Claudes per human, Anthropic still assigns directly responsible individuals (DRIs) to product areas. Humans hold context that spans products and timelines — what's coming down the pipeline, how pieces integrate, organizational intent. Each person maintains a personal dashboard of active Claude sessions, pending PRs, and review requests. This meta-maintenance layer is partly bespoke and partly standardizing.
PM/engineering diffusion. The PM–engineer boundary is blurring. PMs build working prototypes to settle product debates ("code wins arguments" — but now anyone can code). Engineers do more product thinking. The prototype is no longer just an engineering artifact; it's a communication tool where "jank in eight ways but look how this could work" opens up conversations that specs never could.
Bug closure loops. Claude handles end-to-end bug resolution: reads a Slack thread about an issue, fixes it, posts back ("Hey, this is Mike's Claude — here's the PR"), then follows up hours later when the deploy goes live ("You should go test it — is it fixed now?"). Critically, the model exercises judgment about severity — on a weekend memory leak, it advises "just rebalance the server for now" rather than attempting a full rearchitecture.
Non-technical builders. The most transformative shift: people outside engineering building real tools. A recruiter described it as "the first time in my life where the thing in my head and the thing in the world are right next to each other." A go-to-market team member has been iterating on a deeply integrated workflow tool for months, now deploying it across the entire GTM organization. The ceiling of complexity that non-technical builders can reach and sustain has risen dramatically with each model generation.
Software engineering is different, not over. What's changed: the act of writing code, the PM/eng split, the cycle time from idea to prototype. What hasn't: ownership, incident response, understanding production behavior, the craft of deciding what to build and whether it's good. Engineers still dream about elegant solutions — that feeling of loss is real — but they also ship "insane amounts of work." Both feelings coexist.
Community Adoption
A Reddit discussion (r/startups, r/ProductManagement) surfaced practical patterns for AI-assisted planning:
What works:
- AI as thinking partner, not document generator. "Write a rough outline yourself, then have it expand, poke holes, ask 'what am I missing?'"
- Breaking complex features into chunks, identifying dependencies
- Generating acceptance criteria, edge cases, failure modes, test plans
- Constrained prompts: "given we're using Postgres with 50k users, outline the DB changes for role-based permissions" beats "write a spec for our auth system"
- Summarizing Slack threads into decision logs
What doesn't work:
- Generating full PRDs from scratch — "you end up spending more time fixing outputs than saving time"
- Architecture decisions — AI gives "it depends" every time
- Estimation — doesn't know your team's strengths or technical debt
- Business-specific edge cases requiring domain judgment
Consensus pattern: AI for structure and synthesis, humans for accuracy and decisions. Use AI as "a first draft to argue with."
The PRD Debate
Strong opinions on both sides:
- Kill the PRD: "Actually with this I agree, PRD in 95% is not needed. Create epic, describe what you want, tech lead proceeds."
- Keep the PRD: "If you don't have a bible for larger work it's going to create more work in the long run." PRDs are "at their smallest a big prompt" now.
- Middle ground: PRDs are changing shape but not dying. They're becoming the context document you feed to your agent.
The "Cursor for Product Management" Gap
Y Combinator's 2026 RFS explicitly calls out this gap: "There's no system that supports the full loop of product discovery." They want a tool where you upload customer interviews and usage data, ask "what should we build next?", and get feature outlines backed by customer feedback — with development tasks broken down for coding agents.
PMs Building in Cursor (Practitioner Examples)
Several PMs have built bespoke AI workflows from scratch, each spending weeks on plumbing before the interesting work starts:
- Dennis Yang (Chime): PRDs written in markdown inside Cursor, published to Confluence via MCP server, Jira epics auto-generated from the spec. Weekly status reports drafted in minutes from the same source files.
- Zevi Arnovitz (Meta): No technical background. Runs Claude for planning, Gemini for UI generation, then has both models review each other's output (peer review loop built through trial and error). Engineers on his team now ask him to teach them his process.
- Alan Wright: Queried PostHog directly inside Cursor, had the AI diagnose data, produce a markdown summary, push to Notion, and open a Linear ticket — a multi-hour process compressed to minutes.
The common pattern: each built their setup from scratch over weeks. The gap between seeing a demo and replicating the workflow is still large.
Community PDLC Adoption
A team adapted Anthropic's PDLC approach and reported outcomes (marsel040, Apr 2026):
- Enabled product engineers to plan and ship code with minimal developer reliance
- Used AI tools to replace much of the spec/planning overhead
- Result: higher output and better morale — fewer people stuck in planning paralysis
The pattern: AI for structure and synthesis, humans for accuracy and decisions.
AI-Native PM Tools
New tools designed for AI-augmented product development workflows:
- Stitch (Google Labs) — AI UI design tool that turns text, voice, or image prompts into high-fidelity web/mobile designs with production-ready HTML/CSS. https://stitch.withgoogle.com
- Monologue (Every / Dan Shipper) — AI-powered voice dictation for Mac and iOS. Cleans up speech, understands context, works 3x faster than typing. https://twitter.com/@usemonologue
- Linear — Redesigned as "the product development system for teams and agents." Supports AI agent workflows natively in planning/tracking.
- Canny — AI-powered customer feedback collection. Auto-captures, summarizes, and categorizes ideas from multiple sources. Helps teams prioritize features and close the feedback loop with users.
- Spark (Productboard) — AI platform for PMs. Connects customer feedback to product ideas, helps teams make data-backed decisions.
AI Transformation Model (Notion)
A four-level maturity model for organizational AI adoption, created with Ben Levick (Ramp) and Geoffrey Litt (Notion):
| Level | AI Role | What Changes | Impact |
|---|---|---|---|
| 1. Thought Partner | Explore ideas, improve decisions | Individuals prompt AI ad-hoc | Faster output, better decisions |
| 2. Assistant | Complete tasks, save time | Context-aware AI tools embedded in workflow | Hours saved per employee/week |
| 3. Teammates | Automate recurring workflows | Teams deploy configurable agents with checkpoints | 10-40% team capacity reclaimed |
| 4. The System | Run critical workflows at scale | Multi-agent orchestration, self-improving | Operational leverage, revenue per employee |
Key insights: companies operate at multiple levels simultaneously (engineering at L3 while marketing is at L2). Context is the foundation — consolidating tools and connecting data unlocks every level. Typically 10-20% of employees create agents that benefit the whole team. Levels build on each other, they don't replace each other.
AI Evals as the New PRD
Aneesh Chukla's framework (via Aakash Gupta): for AI features, evals are the PRD. "The way the best AI companies work is that the AI PM defines these evals and that is basically the PRD for the AI engineers." The workflow: define success criteria and expected behavior → build offline evals → only then launch to real users with online evals (observability platforms like Arise, TruLens). If offline evals fail, "you have not even created a product that can be actually launched."
This shifts PM work from spec writing to evaluation design — a fundamentally different skill. See Agent Proficiency. For a deeper treatment of eval methodology (floor-raising, LLM-as-judge pipelines, golden cases), see AI Evals.
How Evals Actually Work
Evaluation sits between running an experiment and shipping a change. You have a dataset, you've run your application against it, and now you judge whether the outputs are good. The Langfuse Academy's "AI Engineering Loop" frames this as a continuous cycle: ship an improvement → it produces new traces and monitoring signals → those feed into the next round of datasets and experiments.
The evolution pattern: Most teams follow the same trajectory. Start by manually reviewing outputs to build intuition for what "good" and "bad" look like in your specific application. Then identify specific failure modes worth checking for. Once you can define them precisely, automate with dedicated evaluators. Manual review is not a one-time step — mature setups incorporate continuous human review to catch new failure modes and keep automated evaluators calibrated.
Three evaluation methods:
- Manual evaluation — reading outputs and scoring them. This builds the understanding of where your application struggles and what quality means for your use case. Teams that skip this and jump straight to automation often end up measuring things that don't matter. Manual labels also serve as ground truth for validating automated evaluators later.
- Code-based evaluation — deterministic checks: valid JSON, required schema, keyword presence, length limits, SQL that executes without errors. Fast, cheap, perfectly consistent. The limitation: they cannot assess meaning.
- LLM-as-a-judge — uses a language model to score outputs on qualities that require understanding language: relevance, tone, summary fidelity. Imperfect and easy to get wrong — models don't automatically grade like human experts, need calibration against human preferences, and can share blind spots with the application LLM. But an LLM judge calibrated against human labels and backed by code-based checks is a reliable evaluator.
Reference-based vs. reference-free: Both code and LLM evaluators can compare against a predefined expected output (reference-based) or assess the output on its own (reference-free). Reference-free evaluators can be applied to unseen production data, making them essential for monitoring live traffic.
When to automate: Ask whether the issue is a one-time fix or a generalization problem. If a simple prompt change resolves it, just make the change. If you can identify a failure mode you want to test for repeatedly across different inputs, that's when an evaluator makes sense. Prefer binary scores (pass/fail) over graded scales (1–5) — binary forces a clear definition of acceptable vs. unacceptable, while scales introduce ambiguity about what separates a 3 from a 4 source(https://langfuse.com/academy/evaluate).
Closing the loop: Some evaluators should move beyond offline experiments into production. Reference-free evaluators and user feedback signals applied to live traffic confirm that quality matches what you saw pre-deployment. If production behavior diverges, capture those cases in traces, turn them into dataset items, and run the next round of experiments.
Agent-Ready Requirements (/goal)
Claude Code's /goal and Codex's long-horizon mode formalize a pattern that was already emerging informally: the "Ralph Wiggum loop." Put an agent in a bash loop, give it the same repo context every time, tell it to read the spec and implementation plan, pick the next unchecked task, complete it, run a test, mark the task done only if the test passes, then start again.
The useful part is not a smarter model — it's the fresh context at the start of each run. Instead of trusting one bloated chat to remember everything through compaction, the loop reloads durable files: the spec, the plan, the task list, the test suite, the status notes. The conversation can rot, but the source of truth stays outside the conversation.
The core principle: the loop is only as good as the plan it reloads, the tests, the acceptance criteria, and the evidence it leaves behind. This pushes PM work past "write enough detail that an engineer understands the intent" toward "define done clearly enough that an agent can keep trying, a harness can inspect the evidence, and a human can tell whether the outcome is product-correct."
Weak goals read like wishes. /goal improve onboarding gives the agent no way to know whether onboarding improved. It starts optimizing for whatever is easiest to prove — cleaner screenshots, passing tests, fewer steps. A one-shot mistake is annoying; a loop can spend 40 turns making the wrong thing more internally consistent.
Strong goals give the loop a finish line, a proof method, and a boundary. The spec names observable behavior, the goal names validation commands and scope constraints, and the loop stops after N turns with a status report if blocked. A prompt asks for effort; a contract defines the condition where effort stops.
What PMs should stop handing agents: adjectives ("make it better/cleaner/easier") and vibes ("polish the onboarding flow"). Replace with observable states: "Reduce the empty-state decision path from four visible actions to two. Keep 'Import CSV' and 'Create manually.' Add a regression test that the empty state still exposes both setup paths."
Practical /goal template:
- Source of truth: spec file, implementation plan, status file
- Acceptance criteria: observable behaviors, negative cases, non-regression conditions
- Validation: test commands, lint/typecheck/build, browser/visual evidence if needed
- Boundaries: only edit specific paths, do not change named systems, preserve specific contracts
- Loop behavior: run validation after each change, update the status file with what changed / what passed / what's risky, stop after N turns if blocked
The status file is the durable memory layer — it records what changed, which checks passed or failed, what decisions the agent made, and what a human should inspect next. Every fresh turn reloads the spec and status instead of reconstructing the project from a decaying conversation.
Where /goal works best: migration work (concrete target, cheap validation), backlog clearing (queue of failing tests), file splitting (measurable constraints), brute-force testing (attack vectors until queue empty). Exploration mode can work too, but the goal should produce learning, not production code.
Where it fails: when the goal is an adjective, when there's no validation command, when scope boundaries are missing, or when the PM launches the loop and walks away without watching the first iterations. The first loops are calibration — they teach you how the agent interprets the plan. Watching them is how you make later unattended work less stupid.
Structured Memory for Product Context
The same article describes a pattern for agent memory in product work (implemented in PM OS v2): capture what changed, learn missing background context slowly, then recall the useful part before the next workflow begins.
Capture: selective abstraction, not raw recording. Categories: decisions, risks, assumptions, open questions, stakeholder context, changed recommendations. The system proposes facts worth keeping; the PM approves what to save. "A transcript is a recording. A decision is the abstraction that lets future work continue."
Slow context building ("daily-drip"): instead of a giant setup interview, the system asks one useful question at a time — "What is one thing I should learn about you, your work, or this project that would make future help sharper?" — and routes the answer to the right memory layer. Small question, small answer, better context over time.
Recall: a compact packet — recent decisions, active risks, open questions, assumptions, constraints — loaded before a workflow starts. Working memory, not archive search. This transforms generic workflow prompts ("help me prep for a roadmap meeting") into context-aware ones ("I see the live tension is the Q3 move from mobile to retention, plus design's concern that onboarding is being abandoned. Is this meeting mainly to align design, reset leadership expectations, or decide what happens to mobile commitments?").
The operating rule: define done, prove done, and keep the proof outside the chat.
"Team OS" Pattern
Hannah Stalberg (DoorDash PM, 1,500+ hours in Claude Code) coined "Team OS" — a team-level knowledge base that helps everyone move faster. Claude Code becomes the interface to this shared context. Her key observation: "Claude Code is the most misleading name in AI" — it's not just for code, it's a general-purpose agent for product work. See Claude Code Skill Frameworks.
Vibe Experimentation
Aakash Gupta's concept (ProductCon 2025): the intersection of vibe coding and experimentation. Three implementation stacks: (1) prototype in Lovable → engineer in Cursor → experiment in LaunchDarkly/Amplitude, (2) full-stack vibe coding with integrated experimentation, (3) Claude Code-driven rapid prototyping with A/B testing. The idea: PMs and designers can prototype and test hypotheses without waiting for engineering sprints.
AI-Native Companies in Practice (Ann Miura-Ko)
Ann Miura-Ko (Floodgate, Apr 2026) published field observations from visiting AI-native companies in San Francisco. Key findings:
The PM is disappearing. Across five companies visited in a single day, only one full-time PM — even in a 40-person company. Engineers talk to customers daily and own product decisions end-to-end. The PM role isn't being "augmented" — it's being absorbed into engineering and design.
The most dangerous side effect: the feature factory. When you can build anything a customer asks for in a day, the temptation to build everything is overwhelming. Multiple companies cited this as their biggest strategic risk. Solutions: agents that can only configure existing features through JSON (literally cannot create new code), squad-level North Star metrics to kill ideas before they ship, founders deciding where the product has opinions and where it's flexible. "When execution is nearly free, taste becomes the moat."
The stack is converging. Almost every company: Slack + Claude Code + GitHub + Linear. Slack has become a central orchestration layer for agents — emoji reactions auto-create tickets, bots triage customer issues, agents get tagged in threads and start working on fixes. Cursor mentioned sporadically (six months ago it came up in every conversation). Troubling for coding platforms: engineers don't seem loyal to any particular tool.
Non-engineers are building real things. An enterprise account manager asked an AI agent in Slack to automate account uploads the product team hadn't prioritized — done in an hour. An accounting team writing database queries via MCP. A Chief of Staff producing marketing materials in under 30 minutes. "The most underestimated shift isn't what AI does for engineers. It's what it does for everyone else."
The cost of experimentation has collapsed. A researcher tests 10 interface designs per day and throws 9 away. A designer generates competing iterations in under 6 minutes. A growth PM with zero coding experience built a full Meta Ads pipeline in two days. Companies simulate customers with AI personas before real users touch the product. One team runs hundreds of research interviews in a week instead of 50 in a quarter.
The result: companies iterating 3-5x faster. Both the build and learn steps are compressing. "The gap between companies that have internalized these practices and those still debating 'AI strategy' is enormous — and it's widening every week."
Organizational AI Autonomy Levels (Ann Miura-Ko)
Miura-Ko (May 2026) extended her earlier field observations into a formal maturity framework, modeled after the levels of autonomy in autonomous vehicles. The core argument: "AI-pilled" is used as though it were binary, but companies differ in both intensity (how deeply AI is embedded across the organization) and technical capability (what AI is actually allowed to see, do, and change).
Four diagnostic questions cut across every level:
- What can AI see? Is the company's work legible to a machine, or does it live in undocumented meetings and inaccessible SaaS tools?
- What can AI do? Can it act on systems of record (open PRs, update CRMs, reconcile invoices) or only summarize what humans wrote?
- Who can extend the system? Are non-engineers shipping production internal tools, or does every workflow depend on a few power users?
- How has the organization changed? Or is it running the 2023 org chart with better autocomplete?
L0 — AI as theater. AI can see nothing structured, do nothing of consequence. The hard test: can AI complete any recurring business process end-to-end? Common false positive: a CEO giving excellent AI transformation speeches while running the same executive staff meetings and headcount plans.
L1 — Personal productivity. Each individual's AI sees only what that person feeds it. No org-level visibility, no action on systems of record. The hard test: if your best AI user left tomorrow, would their workflow remain? Common false positive: "80% of employees use AI weekly" — probably true and meaningless.
L2 — Team workflow. Shared context within teams (claude.md files, shared prompts, function-specific MCP integrations). AI handles functional workflows — sales prospecting, support triage, code review — but within team boundaries. The hard test: does any workflow cross team boundaries? Common false positive: "We have AI workflows in every department" — but they don't connect, producing AI-enhanced silos.
L3 — Organizational infrastructure. The whole organization is queryable. Agents act across systems — updating CRMs, opening PRs, routing tickets, reconciling invoices. Non-engineers don't just consume shared skills, they author them. The org chart looks materially different from 2023: zero-PM teams, PM-as-agent-orchestrator, or role convergence into "builders." Token-maxing over headcount-maxing. The hard test: can an agent answer — across systems — what shipped last sprint, who asked for it, what broke, what customers said, and what to do next, without convening a meeting? Common false positive: a landfill of meeting transcripts with no synthesis. "Capture is not legibility."
L4 — Compounding operating system. The system maintains its own context — agents update agents, skills marketplaces propagate wins, duplicate efforts get removed. Agents have policy-driven decision authority within scoped domains. Non-engineers ship production internal tools without filing tickets. Hierarchy collapses toward "channel managers" of agent workflows. The hard test: show a workflow that improved because the system learned from prior runs, not because a person manually improved it — plus three production tools shipped by non-engineers in the last quarter. Common false positive: agent sprawl. "A hundred brittle automations don't equal a compounding operating system." L4 requires managed compounding with lifecycle, observability, and evaluation.
L5 — Virtually self-driving organization. The operating loops sense reality, diagnose issues, initiate work, execute within delegated authority, update shared memory, and improve future behavior — with humans governing strategy, taste, risk, values, and exceptions. Six markers: the system (1) notices something without being asked, (2) synthesizes across sources, (3) decides whether action is warranted, (4) acts within delegated authority, (5) escalates when uncertainty exceeds authority, (6) updates shared memory so future behavior improves. The hard test: what important thing did the company notice, decide, act on, and learn from without a human initiating the process? Common false positive: "fake autonomy" — the system executes preconfigured rules while humans still do all the noticing.
Asymmetry as diagnostic. Companies rarely answer all four questions at the same level. The asymmetry reveals where the next intervention should focus: AI might see a lot but can't act, or act a lot but only engineers can extend it, or the org chart changed but the substrate is thin.
This framework refines the Notion AI Transformation Model (four levels, from thought partner to system) with sharper diagnostic questions and harder tests at each level. It also formalizes what Miura-Ko's earlier field observations described anecdotally — the companies she visited were operating at L3–L4, while most companies claiming to be "AI-pilled" are at L1. The L4 description of non-engineers shipping production tools matches her earlier observation that "the most underestimated shift isn't what AI does for engineers — it's what it does for everyone else." Lin's five bottlenecks maps a different axis — the sequential order companies move through — while Miura-Ko's levels describe the state at each plateau.
Delta Force Teams (Owner)
Deano (Owner, Apr 2026) articulates the most extreme version of the small-elite-team philosophy. Owner reached $15M ARR with 5 engineers — building a product surface where each component has entire companies built around it — and is now valued at $1B+ with a fraction of typical SaaS headcount.
The hiring bar: Modeled after Delta Force, not Navy SEALs. ~1,700 special forces apply, ~6 make it (0.35%). Owner's application-to-offer rate: ~0.22%. The heuristics: "If it's not a hell yes, it's a hell no." "Would you put your job on the line for this person?" "Would I invite this person to my wedding?"
Why small teams demand this bar:
- Standards are contagious — one average performer sets the new standard
- Judgment replaces process — trust and independent decision-making only work if judgment is excellent
- Ownership is non-negotiable — no one to hand off to, no layer to hide behind
AI amplifies the gap: "AI is not an equalizer, it's a multiplier. Exceptional people use it to compress months into days. Average people use it to move slightly faster and often in the wrong direction. The gap between exceptional and average isn't shrinking. It's exploding."
Operational principles:
- Zero tolerance for tech debt — "Today's shortcuts become tomorrow's bottlenecks — paid not once, but every day." Form enables speed, like great athletes.
- Zero tolerance for bug backlogs — "Most teams accept a 'healthy backlog' of known bugs. What that really means is customers are experiencing constant paper cuts."
- No deadlines — "If someone needs a deadline to move fast, we've made a hiring mistake." Deadlines focus teams on the date rather than building something exceptional.
- 2 meetings/week, 1.5 hours total — Monday standup (align on one thing), Friday demo day. Everything else async.
- Trust battery starts at 100% — "When you make someone earn trust before you extend it, you slow them down at the exact moment they have the most energy." Full ownership from day one.
Customer obsession framework: Everyone talks to at least one customer per week. "You should know your customer so well you know what gum they chew." The leader's job: not to sell the plan, but to make reality undeniable. "People don't align to plans. They align to reality they can see for themselves."
The "Nick Fury" leadership model: "My job isn't to be the superhero. It's to go find them, believe in them completely, and then unleash them." Only hire people better than you in some significant way. If you're the most talented person in the room, you've failed.
Key contrast with AI Organization Design: Block replaces hierarchy with an intelligence layer. Owner replaces hierarchy with extreme talent density and minimal process. Both bet against middle management but for different reasons — Block because AI can coordinate, Owner because exceptional people don't need coordination.
Living Software vs. Tool-Like Software
Jack Cheng (Every, Apr 2026) identifies a fundamental tension in AI-accelerated development: software exists on a spectrum between two types, and AI coding acceleration is breaking the social contract of one of them.
Tool-like software — Users expect stability, consistency, and predictability. A hammer doesn't change shape between uses. Traditional SaaS products trained users to expect this: the interface stays roughly the same, features are added gradually, muscle memory works. Tool-like software should "disappear" — you stop noticing it because it reliably does what you expect.
Living software — Users expect growth, adaptation, and evolution. Social media feeds, recommendation engines, AI assistants. The value comes from the system changing and improving. Users tolerate instability because they're getting something new.
The problem: AI coding acceleration is making tool-like software change at living-software speed. Features ship faster than users can absorb them. Interfaces shift before muscle memory forms. "The build step compressed, but the adoption step didn't." Users of accounting software don't want surprises — but the development team can now ship surprises weekly.
For builders of tool-like software: Slow down the release cadence even if you can build faster. Use the speed for quality, testing, and polish — not more features. "The speed is a gift. Spend it on fit, not on volume."
For builders of living software: Lean into the pace, but make the evolution legible. Show users what changed and why. AI assistants that silently get better confuse users; ones that explain their growth build trust.
The connection to the feature factory risk identified by Ann Miura-Ko: when execution is nearly free, the discipline to NOT ship becomes the differentiator. See also Business Moats in AI — the "opinionated perspective" moat is essentially taste about what to ship and what to hold back.
Planning in the AI Era
Karri Saarinen (Linear CEO, Apr 2026) challenges the narrative that planning is going out of fashion. His test: "could the same decision or activity could have been made before AI?" Companies always had the option to do shorter planning cycles, yet many ended up with annual or half-year cycles anyway. The question is why — and whether AI actually solves the underlying need.
Planning isn't about the plans — it's an alignment and commitment exercise. It forces organizations to debate what matters, decide priorities, create shared meaning, and navigate organizational boundaries. AI may increase bandwidth and compress timelines, but the need to choose remains. "If it becomes easier to make more things, it also becomes easier to make the wrong things."
Linear's approach: six-month directional plans with the ability to change priorities any month or week. More experimentation alongside the plan, not instead of it. The risk of no-plan: building whatever comes easily, letting AI steer you toward what's easiest rather than what matters.
This connects to the tools-steer-you problem: AI tools are thinking tools, not just mechanical ones. Their ability to influence your work direction is greater than any previous tool. Vibe coding done well is following the grain of the tool — but without planning, you may never notice you've drifted.
Output Isn't Design
Saarinen invokes Christopher Alexander's definition of design: "good fit between form and context." The form is the solution; the context is the problem space — user needs, constraints, trade-offs, the environment the product lives in.
AI generates plausible output — polished mockups, working code, fluent copy. But output isn't design. "The form is there, the fit is not." AI can produce artifacts that look like solutions without understanding the problem they're supposed to solve.
The implication for product teams: AI makes the visible part of design (producing artifacts) trivially fast, which makes the invisible part (understanding context) relatively more valuable. Teams that skip problem understanding and jump straight to AI-generated solutions will ship products that look polished but miss the mark.
Design tool limitations: Image generation breaks down with iterations — it's hard to make the AI change one specific thing without it reshaping the whole output (the same problem happens in writing). Saarinen argues for better containment tools and semantic UI design, where you define layouts and patterns directly rather than drawing rectangles. "Most design work is about understanding how a feature fits into the existing system." Even products used through MCP, CLI, or API still need coherent concepts and workflows.
Domain matters: Different products require different levels of design polish. A frequent tactile tool like email needs heavy UX attention because users feel every paper cut. A backend service can have rougher UI and still be valuable. Many AI companies operate more like backend companies — the capability is the model, the harness is iterated behind the scenes. "It feels closer to classic UNIX systems."
This connects to the "docs to demos" workflow: the prototype is cheap, but knowing which prototype to build requires the same deep customer understanding it always did.
The AI-Generated Sameness Problem
The concrete symptom of the output-isn't-design gap: AI-built products are converging on a recognizable "AI-generated SaaS" aesthetic. Different founders, different industries, but the same cards, dashboards, layouts, and visual hierarchy. The code works, but the interfaces are interchangeable. This happens because most AI coding tools understand prompts but not design systems — they don't ask "does this component already exist?", "does this follow the typography scale?", or "is this pattern reusable?" A senior frontend engineer's mental model is a system of compounding decisions; an AI tool's default is a collection of plausible components.
The emerging practitioner response is a specialization stack — separate tools for reasoning (architecture, planning), implementation (turning plans into code), and frontend refinement (design consistency, component reuse, design-system awareness). Each layer has a clear job; none tries to replace the others. The argument: the future belongs to AI stacks, not single AI tools, because specialized tools solving specialized problems outperform generalist tools attempting everything. This mirrors the pace layers insight — different layers of the stack should move at different speeds and be governed by different constraints.
The four-way sync pattern addresses Saarinen's iteration problem directly: instead of code → refresh → inspect → repeat, some tools now maintain bidirectional sync between design canvas, source code, and rendered UI — editing any one updates the others. This shortens the feedback loop for the dozens of frontend decisions made daily and keeps the design system as the source of truth rather than the prompt.
DESIGN.md — Text-Based Design Systems for Agents
When agents generate UI, each screen can look fine in isolation but feel like a different product when combined. George (prodmgmt.world, Apr 2026) describes DESIGN.md — a concept originating from Google's Stitch team — as a plain-text design system file that sits in the repo where agents can actually read it. It's the design counterpart to AGENTS.md: README.md tells humans what the project is, AGENTS.md tells coding agents how to work in the repo, DESIGN.md tells design and coding agents what the product should look like.
Two layers: Machine-readable YAML (colors, type, radius, spacing, component properties) and human-readable markdown (what the interface should feel like, which colors do which jobs, layout behavior, allowed and forbidden patterns). Agents need both — pure prose gives mood words without decisions, pure tokens give values without judgment.
PM role: PMs don't need to be designers, but they can name product judgment: dense vs. spacious, playful vs. sober, action-heavy vs. review-heavy. The strongest DESIGN.md files encode specific decisions ("use one accent color per screen, reserved for the main action") rather than vibes ("clean and modern").
Self-review loop: The underused move is making the agent cite DESIGN.md back. After generating a screen, prompt it to list which rules the screen follows, where it invented new patterns, and where the file is silent. When the file is silent, that's the signal to extend it — the document becomes alive.
Failure modes: Too vague (mood words without decisions), too visual without being operational (colors without roles), too rigid (no room for exceptions), or disconnected from real screens (written from brand theory, never tested against agent output).
This connects directly to Ann Miura-Ko's feature factory risk: when execution is nearly free, DESIGN.md becomes one mechanism for encoding the taste that prevents agent-generated UI from diverging into chaos.
Brand-as-Code (Atomic Kits)
Little Plains (Emmett Shine, Alex Leiphart) ships brand identity not as PDFs or design files but as structured code — YAML, JSON, Markdown, HTML, CSS, and SVG organized into an "atomic kit" that any AI agent can read and build from. The deliverable is a zip file with two folders: /human (traditional brand guidelines) and /agent (machine-readable instructions encoding positioning, voice, visual system, motion specs, and UI components). The same brand in two formats — one for understanding, one for extending.
The practical test: a founder drops the agent folder into Cursor, types one prompt, and gets a near-complete landing page with correct fonts, hover states, motion matching the spec, and copy following the brand voice — because the strategic thinking is embedded in the file structure, not described in a deck someone has to interpret.
Value concentrates upstream. In traditional engagements, value accrues gradually as deliverables pile up. With atomic kits, the highest-value work happens at the very start: deep research, positioning judgment, the bold calls about who the brand is for, what it says, and how it sounds. Once that thinking is encoded, downstream outputs run on its logic. The engagement's value curve looks like a funnel on its side rather than a steady climb.
The magic_trick.md problem. Everything an atomic kit produces is a recombination of its inputs — consistent, branded, shippable, and gravitating toward the median. Correctness and originality are not the same thing. Little Plains keeps a magic_trick.md file in every kit: the placeholder for a human creative idea that couldn't have been predicted from the system's inputs. A launch film concept from a film-noir binge, a billboard line that lands harder than the system would produce on its own. Without it, the output is correct and forgettable. The analogy: a well-produced agentic system is an enormous, perfectly cross-referenced record library — useless without a DJ who knows which record to pull in which room on which night.
HTML as the right medium. A component expressed in structured HTML is parseable by any modern agent. An SVG has its logic in geometry. A CSS file encodes visual language structurally. Flat raster exports — PNG, JPEG, PDF — can be looked at but not built from. When systems are structured as HTML, CSS, and SVG, they're built for how agents read and construct, not just how humans view.
Pricing implications. If the kit is the product, pricing moves from time-and-materials toward software-like models: scoped strategy on the front end, ongoing licensing for the living system. The kit isn't a finished asset — it's a system that produces finished assets. What's sold is the encoded thinking that makes every future artifact feel like the same brand.
This extends the DESIGN.md pattern from UI consistency to full brand coherence, and operationalizes the code-native visual generation thesis at the brand-system level. Where DESIGN.md ensures agent-generated screens look like the same product, an atomic kit ensures agent-generated everything — landing pages, campaigns, product UI, motion — feels like the same brand. The magic_trick.md concept also reinforces why taste becomes the moat when execution is nearly free: the system handles correctness, but only human judgment produces the memorable departure from the median.
Code-Native Visual Generation
The most interesting visual AI tools have stopped trying to generate the final output and instead generate the source code behind it. Where pixel-native generation (diffusion models) produces images or videos directly, code-native generation produces a symbolic representation — SVG, HTML/CSS, React components, Lottie JSON, Blender scripts, USD scene graphs — that is then rendered by an engine. The visual output is still pixels, but the source of truth is a structured, editable artifact.
This distinction matters because production workflows care about what happens after generation. A generated image is useful as an output, but a generated visual program is useful as an artifact: it can be edited, reused, versioned, integrated into the software stack, and validated against constraints. If one curve in a logo is wrong, editing an SVG path is precise; inpainting a raster image is a gamble.
The test-time compute advantage. Code-native generation creates a tight iteration loop: Code → Render → Inspect → Revise. The model produces the artifact, renders it, sees what broke, and patches the source. Each iteration improves the underlying artifact, not just the rendered output. This is fundamentally different from pixel-native generation, where more inference usually means sampling more outputs and picking the best one. In code-native generation, the model is debugging a visual program in a closed-loop, verifiable environment — which is why it benefits directly from more tokens and test-time compute source(https://www.a16z.news/p/the-next-frontier-of-visual-ai-is).
Market organization by runtime. Each rendering environment creates a different product wedge: browsers (HTML/CSS, React), SVG renderers (vector graphics, logos), Lottie players (motion design), Blender and game engines (3D scenes), simulators (articulated assets). Examples include Quiver (SVG logo generation where designers edit paths in Figma), Paper (all UI represented as code), and OmniLottie (turning Lottie JSON into model-friendly command sequences for editable animation generation).
3D as the strongest case. A rendered image of a chair is not a chair — it's a picture of one. For 3D assets to be useful in games, simulations, or editing tools, they need consistent geometry, materials, part hierarchy, and scene context. Code-native generation is a natural fit because the iteration loop can propose geometry, render it, inspect whether parts make sense across views, and revise the underlying representation. Projects like VIGA (using Blender as a rendering and feedback environment with semantic tools for observation, modification, and memory) and Articraft3D (framing articulated 3D generation as writing programs that define parts, joints, and tests) show the direction.
The hybrid future. Pixel-native models will remain best for realism, texture, and exploration. Code-native systems will be better for structure, iteration, and production integration. The most useful workflows will combine both. This connects to Saarinen's point in Output Isn't Design — image generation breaks down with iterations because it's hard to change one specific thing without reshaping the whole output. Code-native generation solves exactly this problem by giving the model (and the human) source-level access to what went wrong.
Generative UI Patterns
When agents stop describing results and start showing them, the frontend becomes something the agent draws at runtime rather than something designers ship in advance. Three architectural patterns have emerged, each with different trade-offs on control, scalability, and brand consistency.
The protocol stack. Three protocols divide the work: MCP connects agents to tools, A2A connects agents to each other, and AG-UI connects agents to users via SSE streaming. A2UI (Google's spec for agents emitting UI as schema) rides on AG-UI. State flows both ways on the same stream — user edits propagate to the agent, agent mutations propagate to the user.
Pattern 1: Controlled. Pre-build React components, bind each to a tool name. The agent picks the tool and the component renders inline with the agent's args as props. Your design system stays in charge. The cost: every registered component sits in the agent's context window (~400 tokens per tool description). At 25 components, that's 10,000 tokens of tax per turn before the user says anything. The agent also starts confusing semantically similar components — "pie chart" and "donut chart" both "show proportions." Ship Controlled for ten or fewer high-value flows where design precision matters.
Pattern 2: Declarative (A2UI). The agent emits a JSON schema describing the UI; a component catalog maps schema nodes to renderers. One tool, many UIs. Token cost stays flat regardless of how many card types exist — the agent sees a single function, not 50 tool descriptions. The catalog is the contract: definitions list allowed components with Zod schemas for props, and renderers fill in the React. The trade-off is that the LLM owns the layout, so output varies run to run within the catalog's constraints. Ship Declarative for the long tail of dashboards, results, forms, and widgets.
Pattern 3: Open-ended. No catalog, no schema. The agent writes raw HTML rendered inside a sandboxed iframe. Maximum flexibility, minimum brand consistency. "Neo-brutalist on Tuesday, iOS 4 clone on Wednesday" — style rules in the prompt nudge toward a brand but don't guarantee it. Ship Open-ended only for throwaway interactions: one-shot visualizations, sandboxed experiments, disposable explanations. Never as the primary surface.
The decision heuristic: Pixel-perfect mockups for this flow → Controlled. Dozens of card types to ship → Declarative. One-shot visualization the user will never see twice → Open-ended. Can't decide → default to Declarative, then upgrade to Controlled for the top three flows.
The core insight: "The mistake isn't picking the wrong pattern. It's not knowing you picked one." Most teams default to Controlled because their framework defaults to it, hit the wall at 25 components, and reach for Open-ended because it demos well. Neither was a decision — both were drift.
This connects directly to the DESIGN.md pattern and Brand-as-Code — both encode taste into files that constrain agent-generated UI. Generative UI is the runtime layer those systems inform. It also extends Code-Native Visual Generation: where code-native tools produce editable artifacts (SVG, React components), generative UI patterns determine how agents select and compose those artifacts for the user in real time.
The Bottleneck Cascade (Andrew Ng)
Andrew Ng (deeplearning.ai, Apr 2026) frames the PM bottleneck as just the first in a cascade. When AI coding speeds up 10-100x, every adjacent function becomes the new bottleneck in turn:
- Product management: Engineer:PM ratios dropping from 8:1 toward 1:1. But even 1:1 creates a communication bottleneck — the fastest teams have engineers who do their own product thinking.
- Marketing: Features ship so fast that marketing scrambles to communicate them. "A marketing bottleneck."
- Legal compliance: Software built in a day, legal review takes a week. "A legal compliance bottleneck."
- Design: Same pattern — design can't keep pace with build velocity.
The resolution: generalists. When small teams (2-10 people) must cover five specialties, individuals play roles outside their core. Deep specialization still exists, but everyone understands adjacent functions enough to unblock themselves. Proficiency with AI tools helps — they're thinking tools that let you reason through unfamiliar domains.
Colocation matters for maximum speed: "The highest speed is achieved by having everyone in the room, able to communicate instantaneously." Remote works, but same-room removes the last communication latency.
The Five Bottlenecks of Becoming AI-Native (Alfred Lin)
Alfred Lin (Sequoia, May 2026) extends the bottleneck cascade concept into a sequential transformation model. Where Ng identifies parallel bottlenecks that emerge when engineering speeds up, Lin maps the specific order companies move through — each bottleneck only becomes visible once the previous one is resolved.
Enabled vs. native. Lin draws on the cloud era's distinction: cloud-enabled meant lifting existing applications onto AWS; cloud-native meant designing for the cloud from the first line of code. The same divide applies now. AI-enabled companies add an assistant to the same sales team and measure emails saved. AI-native companies ask: if we started from scratch today, how would we build each layer? "A true redesign is the single largest factor separating companies that capture value from AI from those that do not" source(https://x.com/Alfred_Lin/status/2036433182774727105).
The five bottlenecks, in order:
-
Adoption — Do people actually use the tools? Companies start with token-usage leaderboards to encourage adoption, then discover the real question: is usage productive or just token-maxing? Leadership must shape adoption culturally — Chainguard's CEO expects every engineering manager to have token usage near the median of their direct reports. Below the median, leaders lack lived experience to coach; far above it, they need to be teaching.
-
Engineering velocity — Measured by pull requests. Top 5-10% of builders are now five times more productive than a year ago; the median is up 20%. The job is making the velocity of the top decile accessible to the rest of the team.
-
Product velocity and experience quality — Engineers can ship, but can the company decide what to build fast enough? Roadmaps that took quarters now take weeks. PMs, designers, and leaders with great taste become the rate limiters. "Most companies stall here. They scale engineering velocity into a firehose pointed at the wrong target."
-
The development operating system — Building custom internal tools: eval harnesses, ticket-routing agents, code-review agents, security-review agents, on-call triage agents. AI-native companies keep finding small bottlenecks, removing them, and encoding solutions into their development OS. Crucially, they accept that most of this tooling won't last — the tool shipped last quarter is already aging. "Schumpeter's creative destruction now applies inside the company."
-
Team organization — How to break up and reassemble work in the AI age. Waterfall was invented for large armies of engineers submitting to weekly builds. Two-pizza teams and agile emerged with cloud and microservices. "We have yet to reach an agreement on how best-in-class development will be done in the age of AI, so it is ours to create and own."
The meta-insight: becoming is harder than declaring. Lin notes that Shopify, Duolingo, and Klarna all followed the same arc — a bold internal memo becomes a public artifact, headlines follow, then months later a clarification or walkback. Klarna paused its hiring freeze. Duolingo walked back the replacement framing. What's visible is the tip of the iceberg; beneath it is the wrestling of actually transforming.
This connects to Ng's bottleneck cascade (which focuses on functional bottlenecks: PM, marketing, legal, design) and to the Delta Force Teams pattern (which solves bottleneck #5 through extreme talent density). Lin's development OS bottleneck (#4) maps directly to Noah Brier's pace layers framework — the custom tooling layer moves fast but must be constrained by slower-moving standards and architecture.
Software Company, Not Software Factory
Noah Brier (Alephic) pushes back on the "software factory" metaphor popularized by StrongDM's autonomous code generation system and Dan Shapiro's "Dark Factory" framework. The factory metaphor optimizes for defect elimination — Six Sigma, identical outputs, minimal variance. But software's hardest problem isn't producing code with fewer defects; it's building something people want. Product-market fit, not defect rate, is what matters.
The better metaphor is Andy Warhol's Factory, not Henry Ford's. Both pursued throughput, but Ford eliminated variance while Warhol ensured all work aligned with a single creative vision. It's a software company, not a software factory — the core challenge is keeping an entire system of humans and agents building toward the same vision, from architecture down to individual lines of code.
The alignment problem predates AI. Brier faced this at Percolate (2011), scaling from zero to 100 people in under three years. His job shifted from building the product to building a company capable of building the product. Culture was the strongest lever — what Ben Horowitz defined as "how your company makes decisions when you're not there." He spent half his time on living culture documents, onboarding sessions, and internal tools that routed knowledge to the right people.
AI didn't solve this — it reshaped it. Coding agents produce working code that feels written by someone who hasn't been onboarded: ignoring obvious abstractions, violating stylistic norms present in the codebase. "It looks like a new engineer on the team who hasn't been properly onboarded." Teams write onboarding docs and run training for human colleagues but rarely do it for agents.
Pace layers for AI engineering. Inspired by Stewart Brand's pace layers framework — where society changes at different speeds, from nature (millennia) to fashion (days) — Brier models the AI engineering stack as a cultural stack where slower layers inform faster ones: standards → architectures → specs → plans → code. Lower layers move slowly and constrain upper layers; upper layers move fast within those constraints. The alignment work is building and maintaining the slower layers so that fast-moving code generation stays coherent.
This connects to the DESIGN.md pattern (encoding taste into files agents can read), Agent-Ready Requirements (specs as durable context reloaded each loop), and the feature factory risk (without cultural constraints, speed produces divergence). Where Ann Miura-Ko observed that "taste becomes the moat" when execution is free, Brier's framework specifies how taste propagates: through layered standards that move at different speeds, not through top-down review of every output.
Risks
- Adoption fatigue — "Shipping more stuff does not equal more product value. If users can't keep up, what's the point?"
- Quality at speed — "PM reviews the build" doesn't scale. Without proper QA, rapid shipping creates tech debt.
- Context loss — AI doesn't retain past decisions, so planning loses continuity over time without persistent context
Sources
- "Anthropic shipped 74 features in 52 days. How we tried to adopt their PDLC to our org" — marsel040 (Apr 2026) (link)
- "Anyone using AI for specs/technical planning?" — ml8020 (Reddit, Feb 2026) (link)
- "Requests for Startups" — Y Combinator (2026) (link)
- "The productdevelopmentsystem for teams and agents" — linear.app (link)
- "Build better products with customer feedback" — canny.io (link)
- "Spark, the AI platform for product managers" — productboard.com (link)
- "The AI Transformation Model" — John Hurley / Notion (Apr 2026) (link)
- "How to 10x your productivity as a PM with AI tools" — Aakash Gupta (video, Apr 2026) (link)
- "The Most Important New Skill for Product Managers in 2026: AI Evals Masterclass" — Aakash Gupta / Aneesh Chukla (video, Apr 2026) (link)
- "How this PM Used Claude Code to Support 20 People" — Aakash Gupta / Hannah Stalberg (video, Apr 2026) (link)
- "How We Build Product Teams at Owner" — Deano (tweet thread, Apr 2026) (link)
- "The AI-pilled compounding startup" — Ann Miura-Ko (tweet, Apr 2026)
- "Everyone wants to be AI-pilled. Most Companies Are Still Level 1" — Ann Miura-Ko (tweet, May 2026) — organizational AI autonomy levels (L0–L5), four diagnostic questions, AV-inspired maturity framework
- "Living Software" — Jack Cheng / Every (Apr 2026) (link)
- "Some Notes on AI" — Karri Saarinen (tweet, Apr 2026) (link) — planning debate, output vs. design, expertise paradox, agentic coding reality check, design tool limitations, domain differences
- "DESIGN.md | The One File AI Needs to Match Your UI" — George / prodmgmt.world (tweet, Apr 2026) (link) — DESIGN.md concept, PM checklist, failure modes, self-review prompting pattern
- "Chief Product Officer in a Box: Introducing The AI PM OS" — George / prodmgmt.world (tweet, Mar 2026) (link) — Dennis Yang/Chime, Zevi Arnovitz/Meta, Alan Wright practitioner examples
- "Big Pharma Bets Big on AI" — Andrew Ng / deeplearning.ai (newsletter, Apr 2026) (link) — PM bottleneck cascade, generalist engineers, marketing/legal bottlenecks, colocation advantage
- "Evals, explained" — Lotte / Langfuse Academy (May 2026) (link) — three evaluation methods, manual-to-automated evolution, reference-based vs. reference-free, binary scoring, production eval loop
- "/goal for Product Managers" — George / prodmgmt.world (tweet, May 2026) (link) — Ralph Wiggum loops, agent-ready requirements, strong vs. weak goals, /goal template, PM OS v2 structured memory (capture, daily-drip, recall)
- "The Culture of AI Engineering" — Noah Brier / Every (May 2026) — software company vs. software factory, Warhol vs. Ford metaphor, pace layers for AI engineering, agent onboarding as culture problem
- "The Long Becoming" — Alfred Lin / Sequoia (tweet, May 2026) — five sequential bottlenecks of AI-native transformation, enabled vs. native framing, development OS as internal tooling layer, becoming harder than declaring
- "The Next Frontier of Visual AI Is Code" — Yoko Li / a16z (May 2026) (link) — code-native vs. pixel-native visual generation, Code→Render→Inspect→Revise loop, test-time compute advantage, 3D as strongest case, market organization by runtime
- "The HTML Brand: Input-Based Outcomes" — Emmett Shine / Little Plains (Jun 2026) (link) — brand-as-code atomic kits, value-upstream model, magic_trick.md, HTML as agent-readable brand medium, licensing-model implications
- "Generative UI Is the New Frontend" — Shubham Saboo (tweet, Jun 2026) (link) — three generative UI patterns (Controlled, Declarative/A2UI, Open-ended), AG-UI protocol stack, token economics of component registration, brand inconsistency in open-ended rendering
- "The Claude Code + Cursor + Kombai Workflow" — Suryansh Tiwari (tweet, Jun 2026) (link) — AI-generated sameness problem, specialization stack (reasoning/implementation/refinement), four-way design-code sync pattern
- "How Anthropic Uses Claude Fable 5 With Mike Krieger" — Every / Dan Shipper (video, Jun 2026) — delegation model (overnight tasks, concurrent sessions), verification loops (screenshots, video, mock backends), dynamic workflows, DRI model, PM/eng diffusion, non-technical builder empowerment, software engineering evolving not dying