The most comprehensive longitudinal study of code churn in the AI era comes from GitClear's research across 211 million lines of changed code. Their findings, updated in their 2025 AI code quality analysis, track a clear trajectory.
Before AI coding assistants gained significant adoption (pre-2023), code churn held relatively steady at approximately 3.3%. This meant that for every 1,000 lines of code merged, roughly 33 lines were modified, reverted, or deleted within two weeks. This baseline represented the natural rate of correction in professional software development -- bugs caught in production, minor refactors, requirements that shifted shortly after implementation.
As AI coding tool adoption accelerated, churn followed:
| Year | Code Churn Rate | Change from Baseline |
|---|---|---|
| Pre-2023 (baseline) | ~3.3% | -- |
| 2024 | ~5.7% | +73% |
| 2025 | ~7.1% | +115% |
The trend is not subtle. Code churn has more than doubled in two years. And because total code volume has also increased substantially -- AI tools enable developers to produce significantly more code per day -- the absolute volume of churned code has grown even faster than the rate suggests.
Consider a team that merges 10,000 lines of code per week. At the pre-AI baseline of 3.3%, approximately 330 lines would turn over within two weeks. At the current rate of 7.1%, that number rises to 710 lines -- more than double the rework. Multiply this across an engineering organization with dozens of teams, and the hidden cost of AI-driven churn becomes substantial.
Not all code churn is created equal. GitClear's data identifies three specific categories that have grown disproportionately since AI coding tools became widespread.
AI coding assistants frequently generate code in the wrong location. The logic is correct, but it is placed in a file, module, or layer where it does not belong architecturally. A human engineer later moves it to the appropriate location. The original code is deleted; the moved code is added. Both operations register as churn.
This happens because AI tools optimize for correctness at the function level without understanding codebase organization. An AI assistant asked to implement a utility function may place it in the controller layer because that is where the prompt was issued, not in the shared utilities module where it belongs.
AI tools are pattern-matching engines. When asked to solve a problem, they generate code that resembles solutions they have seen in training data -- or earlier in the same codebase. This frequently results in code that duplicates existing functionality rather than reusing it.
The duplicate code works. It passes tests. But a team member later discovers the duplication during review or refactoring and consolidates the implementations. The redundant copy is deleted. Churn.
GitClear's data shows that copy-pasted code and moved code together account for a significant share of the increase in churn since AI adoption accelerated. These categories were relatively stable in the pre-AI era because human developers, familiar with their codebase, naturally avoided these patterns.
The most telling category. Code that is merged and then substantially rewritten within days suggests that the original implementation was not quite right -- it solved the surface-level problem but missed edge cases, violated conventions, or introduced subtle issues that only became apparent during integration or subsequent development.
This pattern is particularly common with AI-generated code because:
The data shows what is happening. Understanding why requires examining the mechanics of how AI coding tools interact with development workflows.
AI coding assistants generate code by predicting what comes next based on patterns in training data and the immediate context window. They do not understand the codebase's architecture, the team's conventions, or the system's constraints. They mimic patterns rather than reasoning from principles.
This means AI-generated code tends to be locally correct but globally unaware. It solves the immediate problem in a way that is syntactically clean and functionally adequate, but may conflict with how the rest of the system is structured. When a human engineer with full system context encounters this code, they rewrite it -- not because it is wrong, but because it does not fit.
GitHub's own research shows that developers using AI tools complete tasks faster and report higher satisfaction. But faster task completion means more PRs to review. When code volume increases by 30-70% in high-adoption organizations, review capacity does not scale proportionally.
The result is predictable: reviewers spend less time per PR. AI-generated code, which is typically well-formatted and syntactically clean, receives even less scrutiny because it "looks right." Issues that a thorough review would catch -- architectural misalignment, duplication, missing edge cases -- slip through and surface later as churn.
Most AI coding tools operate with limited context. They see the current file, perhaps a few related files, and the prompt. They do not have access to the team's architectural decisions, the module dependency graph, or the history of why certain patterns were chosen over alternatives.
This information asymmetry is the fundamental driver of AI-induced churn. A human developer who has worked on a codebase for months carries implicit knowledge about how things should be done. An AI assistant starts from zero context with every prompt.
A behavioral pattern has emerged in AI-assisted development: developers accept AI suggestions quickly -- because the code looks reasonable -- and plan to "clean it up later." This deferred cleanup is rational at the individual level (the developer ships faster today) but creates systemic churn at the team level (someone has to do the cleanup, and that cleanup registers as code modification).
Rising churn is not a reason to abandon AI coding tools. It is a reason to instrument, measure, and manage the quality of AI-assisted output. The following strategies address the root causes identified above.
You cannot manage what you do not measure. Code Turnover Rate -- the percentage of merged code that is reverted, deleted, or substantially rewritten within 30 or 90 days -- is the metric that makes churn visible and actionable.
Critically, Code Turnover Rate should be segmented by authorship: AI-generated code tracked separately from human-written code. This segmentation reveals whether rising churn is an AI quality problem, a review process problem, or something else entirely.
Healthy teams maintain AI-generated code turnover within 1.5x of human-written code turnover. Teams above 2.0x have a systematic quality problem that needs intervention (Larridin internal benchmark).
Much of AI-induced churn stems from under-specified prompts. A prompt that says "implement the search function" gives the AI no information about where the function should live, what patterns to follow, or what constraints to respect.
Better prompts reduce churn by giving AI tools the context they lack:
src/services/search.ts following the pattern established in src/services/filter.ts."QueryBuilder class already defined in src/utils/query.ts rather than creating a new query construction approach."Teams that invest in prompt engineering practices see measurably lower churn on AI-generated code (Larridin internal benchmark).
AI-generated code needs more review scrutiny, not less -- even though it often looks cleaner than human-written code. Effective review standards include:
Quality gates create automated checkpoints that catch churn-inducing patterns before they merge:
These gates do not slow down AI-assisted development. They redirect review attention to the PRs most likely to produce churn, which is more efficient than applying equal scrutiny to everything.
A single churn measurement is a data point. A trend over weeks and months is actionable intelligence. Teams should track:
The Developer AI Impact Framework incorporates churn tracking as a core quality signal, enabling engineering leaders to see these trends alongside productivity and output metrics.
Code churn doubling is not a failure of AI coding tools. It is a predictable consequence of a technology that dramatically increases code production without automatically ensuring that the additional code is architecturally sound, non-duplicative, and aligned with the existing codebase.
The teams that will benefit most from AI coding tools are not the ones that generate the most code. They are the ones that generate code which sticks -- code that does not churn. Measuring and managing churn is how engineering leaders distinguish genuine productivity gains from inflated output.
As developer roles shift from code authoring to code orchestration and verification -- a transition explored in From Coding to Verification: How Developer Roles Are Changing -- churn management becomes a core engineering competency. The developers who write the best prompts, design the best review processes, and maintain the lowest churn ratios will define what "productive" means in the AI era.
Code churn measures the total rate at which code is modified or deleted within a short window after being written -- it counts all changes, including healthy refactoring. Code Turnover Rate is a more specific metric that isolates rework: code that was merged and then reverted, deleted, or substantially rewritten, suggesting the original change was flawed or unnecessary. Churn is descriptive; turnover is diagnostic.
According to GitClear's analysis of 211 million lines of code, code churn rose from a pre-AI baseline of approximately 3.3% to 5.7% in 2024 and 7.1% in 2025 -- a doubling over two years. The absolute volume of churned code has grown even faster because total code output has also increased with AI adoption.
Not necessarily. AI coding tools do increase output and speed. The issue is that some of that additional output does not stick -- it gets rewritten, moved, or deleted shortly after being merged. The net productivity gain depends on whether the increase in output exceeds the increase in rework. Without measuring churn, organizations cannot answer this question and risk overestimating AI's impact.
There is no universal benchmark, but teams should monitor the ratio of AI-generated code churn to human-written code churn. A ratio below 1.5x indicates that AI code quality is close to human baseline. A ratio above 2.0x suggests systematic quality issues with AI-generated code that need to be addressed through better prompts, stricter review, or improved tooling (Larridin internal benchmark).
Yes. The most effective interventions -- better prompt engineering, targeted review standards, and automated quality gates -- reduce churn by improving the quality of AI-generated code at the point of creation rather than adding friction to the entire workflow. Teams that invest in these practices typically see churn reduction without sacrificing the speed benefits of AI tools.