Innovation Rate: Features vs Bugs | Developer Productivity

TL;DR

Innovation Rate measures the percentage of PRs and commits that ship new features versus bug fixes versus infrastructure and maintenance work. It reveals how a team spends its engineering capacity -- building new capabilities or maintaining existing ones.
AI tools should shift innovation rate upward over time. If velocity is rising but innovation rate is declining, AI is generating rework and maintenance burden, not reducing it.
DORA metrics cannot distinguish between a deploy that launches a new product capability and one that patches a broken dependency. Innovation Rate fills this gap by classifying what kind of work each PR represents.
Innovation Rate pairs with Code Turnover Rate as a Pillar 4 (Quality) metric in Larridin's Developer AI Impact Framework, revealing whether AI-generated code creates lasting value or technical debt that drags the team back into firefighting.

What Is Innovation Rate?

Innovation Rate is a developer productivity metric that measures the percentage of pull requests and commits that ship new features versus bug fixes versus infrastructure, configuration, and maintenance work. It answers a question that volume metrics cannot: of all the code your team is producing, how much of it moves the product forward?

The formula is straightforward:

Innovation Rate = (Feature PRs / Total PRs) x 100

Innovation Rate is a distribution metric, not a value metric. It measures how a team allocates its engineering capacity across work types. Because it operates on PR counts rather than PR size, it works best at the team level over multi-week windows where the size variance across PRs averages out. For teams that need size-adjusted precision, pair Innovation Rate with Complexity-Adjusted Throughput (CAT) to weight each PR by difficulty.

Innovation Rate is most useful as a full distribution across five categories:

Feature -- New product capabilities, user-facing functionality, new API endpoints, new workflows.
Bug Fix -- Patches, regression fixes, incident responses, error handling corrections.
Refactoring / Tech Debt -- Planned code restructuring, architecture improvements, performance optimization, and technical debt paydown that is neither a new feature nor a bug fix.
Test -- Standalone test additions, test infrastructure improvements, and test coverage expansion not bundled within a feature PR. Tests that ship as part of a feature PR are classified under Feature, not here.
Infrastructure / Config -- Dependency updates, CI/CD pipeline changes, build configuration, monitoring setup, environment configuration, tooling changes.

A team with a 55% Innovation Rate ships features in more than half its PRs. A team at 20% spends the vast majority of its engineering time keeping the lights on. Both teams might have identical PR counts, identical lines of code, and identical DORA scores -- but they are producing fundamentally different outcomes.

Why Innovation Rate Matters

Volume metrics are blind to purpose

Traditional productivity metrics -- PRs merged, lines of code, commits per week -- count activity without asking what that activity accomplishes. A team that ships 40 PRs in a week looks productive. But if 30 of those PRs are bug fixes triggered by fragile AI-generated code from the previous sprint, that team is running in place.

Innovation Rate makes this visible. It does not replace volume metrics; it contextualizes them.

DORA does not distinguish value from maintenance

DORA metrics -- deployment frequency, lead time for changes, change failure rate, mean time to recovery -- measure the mechanics of delivery. They tell you how fast code moves from commit to production and how reliably that pipeline operates. What they cannot tell you is whether the code flowing through that pipeline creates new value or merely preserves existing functionality.

A team can achieve elite DORA scores while spending 80% of its capacity on bug fixes and infrastructure. Deployment frequency is high because patches ship quickly. Lead time is low because fixes are small. Change failure rate is low because maintenance changes carry less risk than new features. The DORA dashboard looks excellent. The product has not gained a meaningful new capability in months.

AI should shift the ratio -- and you need to verify that it does

The core promise of AI coding tools is that they free engineers from repetitive, low-complexity work so they can spend more time on creative, high-value feature development. If that promise holds, Innovation Rate should rise after AI adoption: AI handles boilerplate, generates test scaffolding, automates configuration changes, and accelerates simple bug fixes -- leaving engineers with more capacity for feature work.

But the opposite can happen. If AI-generated code is lower quality, it creates a secondary wave of bug fixes. If AI produces code that is poorly integrated with existing systems, it generates infrastructure work to reconcile the gaps. If AI accelerates the creation of features that ship without adequate testing, those features boomerang back as incidents and hotfixes.

Innovation Rate is the metric that catches this failure mode. Rising velocity with declining innovation rate is the clearest signal that AI is generating maintenance, not reducing it.

How to Compute Innovation Rate

Step 1: Classify each PR

Every merged pull request gets assigned to one of five categories: Feature, Bug Fix, Refactoring/Tech Debt, Test, or Infrastructure/Config. There are multiple approaches to classification:

Label-based classification. Teams apply labels (feature, bugfix, test, infra) during PR creation. This is simple and explicit, but compliance is inconsistent -- developers skip labels under time pressure, and label taxonomies drift over time without enforcement.

Title and branch name heuristics. Parse PR titles and branch names for keywords. Branches starting with feature/, fix/, refactor/, chore/, or test/ map directly to categories. PR titles containing "fix", "patch", "hotfix", or "bug" map to Bug Fix; "refactor", "cleanup", "tech debt" map to Refactoring. This works well for teams with strong branch naming conventions and fails silently for teams without them.

File path analysis. Changes to *.test.*, *_test.*, or files in /tests/ directories map to Test. Changes to CI configuration, Dockerfiles, Terraform, or package.json map to Infrastructure/Config. Changes to application source code map to Feature or Bug Fix depending on other signals. This approach captures obvious cases but struggles with ambiguous ones.

LLM-based classification. Use a language model to read the PR title, description, and diff summary, then classify the PR into a category. This is the most accurate automated approach for ambiguous cases -- an LLM can distinguish between a source code change that adds a new endpoint (Feature) and one that fixes a null pointer exception (Bug Fix) in ways that heuristics cannot. The trade-off is cost and latency, though both are minimal for classification tasks.

Recommended approach: layered automation. Use file path analysis and branch naming heuristics as a first pass. Apply LLM-based classification to PRs that remain ambiguous. Allow developer override for edge cases. This captures 85-90% of PRs automatically and reserves human judgment for the cases that genuinely need it.

Step 2: Code Type Bucketing at the commit level

For deeper granularity, classify at the commit level rather than the PR level. A single PR often contains a mix of feature code, tests, and configuration changes. Code Type Bucketing breaks each commit into a split:

Feature lines -- New application logic, UI components, API implementations.
Bug fix lines -- Patches to existing code that correct incorrect behavior.
Refactoring lines -- Restructured code that changes internal design without altering external behavior.
Test lines -- Unit tests, integration tests, end-to-end tests.
Infrastructure lines -- Build configuration, CI/CD, deployment scripts, dependency management.

This per-commit decomposition produces a more accurate picture than PR-level classification alone. It also resolves the mixed-PR problem: a PR labeled "Feature" might contain 60% feature code, 30% tests, and 10% configuration. At the commit level, those tests are correctly attributed rather than lumped under "Feature." Note that tests bundled inside a feature PR are healthy engineering practice -- the commit-level view captures this nuance while the PR-level view classifies by primary intent.

Step 3: Aggregate

Roll Innovation Rate up across the dimensions that matter to your organization:

Per engineer -- Useful for understanding individual work allocation, not for ranking. An engineer with a low Innovation Rate might be the team's designated incident responder or infrastructure lead -- both valuable roles.
Per team -- The most actionable level. A team consistently below 35% Innovation Rate is spending the majority of its time on maintenance and should investigate root causes.
Per repo -- Reveals which codebases are innovation-driving and which are maintenance sinks. Mature, stable repos naturally have lower Innovation Rates than active product development repos.
By week -- Track trends over time. Innovation Rate should be stable or rising after AI adoption, not declining.

Benchmarks

These benchmarks reflect patterns across engineering organizations as of early 2026. Healthy ratios vary by team maturity, product stage, and organizational context -- a team in active incident response will naturally have a different distribution than a team building a greenfield product.

Metric	Healthy	Watch	Warning	Critical
Innovation Rate (% features)	>50%	35-50%	20-35%	<20%
Bug Fix %	10-20%	20-30%	30-45%	>45%
Refactoring / Tech Debt %	5-15%	15-25%	25-35%	>35%
Test % (standalone)	5-15%	<5%	<3%	0%
Infrastructure %	10-20%	20-30%	30-40%	>40%

A healthy product team ships more than half its PRs as new features, keeps bug fix work under 20%, invests 5-15% in deliberate refactoring, maintains standalone test coverage expansion at 5-15%, and limits infrastructure churn to under 20%. Note that tests bundled within feature PRs are not counted in the Test row -- that row captures standalone test efforts only.

Context matters. A platform team responsible for CI/CD, deployment infrastructure, and developer tooling will naturally have a higher Infrastructure % and lower Innovation Rate than a product team. These benchmarks apply primarily to product-facing engineering teams. Platform and infrastructure teams should define their own baselines relative to their mission.

Innovation Rate + AI: What to Watch For

AI's impact on Innovation Rate is the most direct test of whether AI tools are fulfilling their core promise. Three patterns emerge:

AI-assisted PRs with higher Innovation Rate than human-only PRs. This is the positive signal. It means AI is genuinely accelerating feature work -- helping engineers prototype faster, generate boilerplate for new endpoints, scaffold new components, and move through Easy feature work quickly so they can focus on Hard feature problems. When you segment Innovation Rate by AI attribution and see a higher feature percentage in AI-assisted work, AI is functioning as an innovation accelerator.

AI-assisted PRs concentrated in bug fixes and infrastructure. This is a neutral-to-negative signal. AI is being used for maintenance -- fixing issues, updating configurations, patching dependencies. This is not inherently bad (AI is good at these tasks), but it means AI is not yet shifting the team's capacity toward new value creation. The intervention is enablement: train engineers to use AI for feature development, not just cleanup.

Rising velocity with declining Innovation Rate. This is the red flag. More code is being produced, but a growing share of that code is bug fixes and maintenance. The most common cause: AI-generated code from previous sprints is creating a secondary wave of defects. Each sprint ships more AI-assisted features, but those features generate enough bugs that the following sprint's capacity is consumed by fixes. The team produces more total PRs each quarter, and the percentage of those PRs that are features drops each quarter. This is the AI maintenance treadmill, and Innovation Rate is how you detect it before it becomes entrenched.

How Innovation Rate Fits the Developer AI Impact Framework

Innovation Rate is a Pillar 4 (Quality) metric in Larridin's Developer AI Impact Framework. It serves as a companion to Code Turnover Rate, and together they answer a two-part question about AI's impact on engineering quality.

Code Turnover Rate asks: Is AI-generated code durable, or does it get rewritten within 30-90 days?

Innovation Rate asks: Is AI-generated work creating new capabilities, or is it generating maintenance?

A team with low code turnover and high innovation rate is in an ideal state: AI helps them build new features, and those features stick. A team with high code turnover and low innovation rate is in the worst state: AI-generated code churns, and the team spends its time fixing what AI built instead of building what customers need.

The four failure modes map cleanly:

Code Turnover Rate	Innovation Rate	Diagnosis
Low	High	Healthy -- AI accelerates durable feature work
Low	Low	Maintenance trap -- stable code, but no new value
High	High	Speed without quality -- features ship but churn
High	Low	Crisis -- AI is generating unstable maintenance code

Innovation Rate also connects to Complexity-Adjusted Throughput: a team can have high CAT (shipping complex work) but low Innovation Rate (that complex work is all bug fixes on legacy systems). CAT tells you how much hard work gets done. Innovation Rate tells you what kind of hard work it is.

Read the full Developer AI Impact Framework -->

Frequently Asked Questions

What is innovation rate in software engineering?

Innovation Rate is a metric that measures the percentage of pull requests and commits that ship new features versus bug fixes, tests, and infrastructure work. It classifies each PR into one of four categories -- Feature, Test, Bug Fix, and Infrastructure/Config -- and tracks the distribution over time. A team with a 55% Innovation Rate spends the majority of its engineering capacity building new product capabilities. A team at 20% spends most of its time on maintenance and firefighting. Innovation Rate reveals how engineering capacity is actually allocated, independent of volume metrics like lines of code or PR counts.

What is a good innovation rate for engineering teams?

A healthy Innovation Rate for product-facing engineering teams is above 50% -- meaning more than half of merged PRs deliver new features. The remaining capacity should split roughly as 15-25% tests, 15-25% bug fixes, and 10-20% infrastructure. Teams between 35-50% are in a watch zone and should investigate whether maintenance work is crowding out feature development. Below 20% is critical: the team is spending the vast majority of its time keeping existing systems running rather than building new capabilities. These benchmarks apply to product teams; platform and infrastructure teams will have different healthy distributions based on their mission.

How do you classify PRs as features vs bug fixes?

The most effective approach is layered automation: file path analysis and branch naming heuristics as a first pass, LLM-based classification for ambiguous cases, and developer override for edge cases. Branch names like feature/ and fix/ map directly to categories. File paths in /tests/ or changes to CI configuration map to Test and Infrastructure respectively. For PRs that remain ambiguous -- a source code change that could be a new feature or a fix -- an LLM can read the PR title, description, and diff summary to classify with high accuracy. This layered approach captures 85-90% of PRs automatically.

Does AI improve or hurt innovation rate?

AI should improve Innovation Rate by automating low-value maintenance tasks and freeing engineers for feature work -- but it can hurt Innovation Rate if AI-generated code is low quality. The positive case: AI handles boilerplate, generates test scaffolding, automates dependency updates, and accelerates simple bug fixes, leaving engineers with more capacity for creative feature development. The negative case: AI-generated code ships without adequate review, creates defects, and triggers a secondary wave of bug fixes that consumes the capacity AI was supposed to free. The distinguishing metric is whether Innovation Rate rises or falls after AI adoption. Rising velocity with declining Innovation Rate is the clearest signal that AI is generating maintenance rather than reducing it.

How is innovation rate different from velocity?

Velocity measures how much work gets done; Innovation Rate measures what kind of work it is. A team can have high velocity -- many PRs merged, high Complexity-Adjusted Throughput, strong DORA metrics -- while spending 80% of that velocity on bug fixes and infrastructure. Velocity says the team is productive. Innovation Rate says the team is productive at maintenance, not innovation. The two metrics are complementary: velocity without Innovation Rate tells you the team is busy, and Innovation Rate without velocity tells you the team is focused on the right work but might not be doing enough of it. You need both to understand whether engineering capacity is being invested effectively.

Innovation Rate: Are Your Engineers Shipping Features or Fighting Fires?