Larridin Blog

Developer Intelligence: What Is the AI Slop Index?

Written by Larridin | Jun 28, 2026

 

AI Slop Index is Larridin's quality signal for low-durability AI-assisted code. It scores pull requests from 1 to 5 across maintainability and correctness risk, helping engineering leaders see when AI is increasing code volume while also increasing review burden, cleanup work, and future maintenance cost.

AI has made it easy to create a lot of code. That is useful. It also creates a new failure mode: teams can ship more code than their review, architecture, and maintenance systems can absorb.

The same pattern already happened in content. AI made publishing cheap, but it did not make good judgment cheap. Engineering now has the code version of that problem. A team can produce more PRs, more tests, more boilerplate, and more surface area while quietly reducing the durability of the software.

The AI Slop Index exists to catch that early.

Key Findings

Finding What it means for engineering leaders
AI Slop Index is a 1-5 PR-level quality score. A higher score means more maintenance and correctness risk in the reviewed change.
It is not a code-volume metric. The question is not "how much code did AI write?" but "how much future burden did this code create?"
The score has five dimensions. Larridin looks at signal-to-noise, unnecessary abstractions, unreviewed paste, defensive bloat, and reinventing the wheel.
The overall score is holistic. It is not a formula over the five components; it reflects the full quality pattern in the pull request.
It works best as a leading signal. Review time, rework, code turnover, incidents, and tech debt often show up later. AI Slop Index gives leaders an earlier warning.

Evidence and Methodology

Larridin scores AI slop at the pull-request level. Each scored PR receives an overall score from 1 to 5:

Score Interpretation
1 Clean code with little or no slop.
2 Minor slop patterns that do not materially affect maintainability.
3 Noticeable slop that can create maintenance friction.
4 Significant slop that can hinder future work.
5 Heavy slop that can become a persistent source of confusion, bugs, or rework.

The score is designed around maintenance and correctness risk relative to the norms of the repository. That distinction matters. The problem is not that a change was written with AI. The problem is when the change looks productive at merge time but creates avoidable cost later.

Larridin evaluates five supporting dimensions:

Dimension What it catches
Signal-to-noise Filler code, obvious comments, tutorial-style explanations, and boilerplate that increases line count without increasing meaning.
Unnecessary abstractions Helper layers, wrappers, configuration objects, or indirection that make simple logic harder to understand.
Unreviewed paste Code that looks dropped in without integration: unused imports, style drift, duplicated logic, or API usage that contradicts local patterns.
Defensive bloat Error handling and guards that are disproportionate to the real failure modes.
Reinventing the wheel Custom utilities or patterns that duplicate existing codebase or ecosystem capabilities.

Larridin supports two scoring paths. A diff-only scorer can evaluate the PR from the change itself. A repo-aware agent scorer can inspect existing utilities and conventions before assigning the score. In warehouse rollups, Larridin prefers the agent-based score when it is available and falls back to the diff-only score when it is not.

At the API layer, AI Slop Index is exposed as a time series with the overall score and component scores. Rollups are weighted by the number of slop-scored PRs, so a team with more scored changes contributes proportionally to the aggregate.

Concrete Operator Scenario

A VP Engineering rolls out AI coding tools and sees the surface-level numbers improve. PR count is up. More tests are being added. Cycle time initially looks acceptable. The team feels faster.

Then the drag starts to appear.

Reviewers spend more time untangling generated abstractions. PRs are larger than they need to be. Some tests assert implementation details rather than product behavior. A few changes are accepted because they look plausible, then rewritten two weeks later. Engineers are still shipping, but more of the work is cleanup around code that should have been simpler.

This is where AI Slop Index is useful. It does not wait for a production incident or a quarterly tech-debt review. It shows whether the PRs flowing through the system are becoming noisier, more defensive, less integrated, or more likely to create future maintenance work.

Measurement Approach

Use AI Slop Index as a leading quality signal, then compare it with lagging outcomes.

The most useful operating view is not a single team-wide number in isolation. It is the relationship between slop, review behavior, and durability:

Signal How to read it
AI Slop Index rising while AI code share rises AI output may be increasing faster than review quality or engineering discipline.
Slop rising with PR cycle time Reviewers may be spending more time interpreting generated code.
Slop rising with 30-day code turnover AI-assisted changes may be less durable after merge.
High unreviewed paste Teams may be accepting generated output before integrating it with local conventions.
High defensive bloat AI may be adding guards, fallbacks, or error handling that obscure the real system behavior.
High reinventing the wheel Generated code may be missing existing internal utilities or platform patterns.

This is also why traditional leading indicators are weaker in the AI era. Test coverage can be inflated. More tests do not automatically mean better tests. More code does not automatically mean more product. More PRs do not automatically mean more engineering value.

AI Slop Index focuses on whether the code itself looks like durable work.

Caveats And Failure Modes

AI Slop Index should not be used as a surveillance metric for ranking individual engineers. That creates the wrong incentive. Engineers will optimize for making the score look good instead of improving the system that produces better code.

It should also not be treated as a complete quality model. Some high-risk changes are not sloppy. Some clean-looking changes still contain design errors. Some exploratory work should look rough before it hardens. The score is a signal, not a verdict.

The safest use is at the workflow, team, repository, and trend level:

Bad use Better use
"Which engineer writes the sloppiest code?" "Which workflows produce the most review drag?"
"Block all PRs above a score of 3." "Review high-scoring PRs and identify repeated failure modes."
"AI is bad because slop went up." "AI output is outpacing our review and integration practices."
"Coverage is up, so quality is fine." "Coverage is up, but slop and turnover are also up, so durability is questionable."

The goal is not to shame AI use. The goal is to separate useful AI assistance from cheap code volume.

What To Do Next

Start by tracking AI Slop Index alongside AI code share, PR cycle time, review pushback, and 30-day code turnover. Look for divergence: places where AI-assisted output is rising while slop, rework, or review burden is also rising.

Then inspect the component scores. If signal-to-noise is high, focus on review standards and prompt patterns. If unnecessary abstractions are high, tighten architectural review. If unreviewed paste is high, check whether engineers are integrating generated code with local conventions. If defensive bloat is high, ask whether AI is hiding uncertainty behind fallback code. If reinventing the wheel is high, improve context and retrieval around internal utilities.

The leadership question is simple:

Are AI tools helping the team produce more durable software, or are they creating more code for the organization to carry?

AI Slop Index gives engineering leaders an earlier way to answer that question.

FAQ

Is AI Slop Index a measure of how much code AI wrote?

No. AI code share measures how much code is AI-assisted. AI Slop Index measures the quality risk of the resulting pull requests.

Is a higher AI Slop Index good or bad?

Higher is worse. The score runs from 1 to 5, where 1 means little or no slop and 5 means heavy slop.

Does AI Slop Index replace code turnover or review metrics?

No. It complements them. Code turnover, review time, and incidents often show lagging cost. AI Slop Index helps detect the patterns that can create that cost before they fully show up downstream.

Should teams use AI Slop Index to judge individual engineers?

No. It is more useful as a system signal: which workflows, repositories, prompts, and review practices are producing durable AI-assisted code.