What Rick Rubin Knows About AI That You Don't

Rick Rubin barely plays any instruments. He doesn't know how to work a mixing board.

When Anderson Cooper asked him on 60 Minutes what, exactly, he does in a recording studio, Rubin's answer was disarmingly simple. "I like the idea of getting the point across with the least amount of information possible," Rubin said. He calls himself "a reducer, instead of a producer."

Cooper pushed. So you don't play music, you don't know the equipment. What are people paying for?

"The confidence that I have in my taste, and my ability to express what I feel, has proven helpful for artists."

One of the most successful music producers in history defines his job as subtraction. He has shaped landmark albums for Johnny Cash, the Beastie Boys, Adele, Jay-Z, and Metallica. And in his own words, his value comes from knowing what to remove.

When Rubin worked with Cash in the 1990s, Cash was considered a relic. Dropped by his label. Playing county fairs. Making bloated Nashville records buried under string sections and background vocals.

Rubin stripped all of it away. He sat Cash down with a single acoustic guitar in a living room and pressed record. No band. No overdubs. No production tricks.

The result resurrected Cash's career and produced some of the most acclaimed albums of the decade. Rubin didn't play a single note. He just knew what the music should sound like. And he knew when it did.

Software has inherited the same split.

AI can write code, run tests, debug failures, and iterate for hours without stopping. The question is whether anyone in the room knows what "right" sounds like. Rubin's entire career is the answer. The person who can evaluate is worth more than the person who produces. The music industry learned that in the 1990s. Software is learning it now.

The Year We Proved It

Andrej Karpathy coined the term "vibe coding" in February 2025, describing a new way of working. "Fully give in to the vibes, embrace exponentials, and forget that the code even exists." By the end of the year, 90% of development teams were using AI in their workflows, according to Google's DORA report. AI was generating roughly 41% of all new code.

The numbers told a story of acceleration. Faster prototypes. Faster pull requests. Faster everything.

Then METR published its study.

In July 2025, the nonprofit research organization ran a randomized controlled trial with 16 experienced open-source developers. These weren't beginners. They averaged five years and 1,500 commits on their repositories, working on codebases over a million lines of code.

METR gave them 246 real tasks and randomly assigned each to allow or disallow AI tools. Developers using AI took 19% longer.

Not 19% faster. Slower.

Before starting, those same developers predicted AI would make them 24% faster. After finishing, they still believed they'd been sped up by 20%. They were wrong about their own productivity, in the wrong direction, by a wide margin.

A separate study by Uplevel, analyzing nearly 800 developers using GitHub Copilot, found no significant improvement in cycle time or throughput. It did find a higher bug rate.

Speed that feels like progress but isn't.

The problem wasn't the tools though, it was calibration. AI made producing code faster while quietly degrading the ability to evaluate it.

The Vibe Coding Hangover

Karpathy was careful with his original framing. Vibe coding, he said, was "not too bad for throwaway weekend projects." The caveat got lost.

By late 2025, roughly 10,000 startups had tried to build production applications entirely with AI. The results were predictable to anyone paying attention.

Leonel Acevedo built an entire startup called Enrichlead using Cursor with zero handwritten code. Within 72 hours of launch, users bypassed his paywall by changing a single value in the browser console. He couldn't audit the 15,000 lines of AI-generated code because he hadn't written any of it. He shut the project down.

Kent Beck, the creator of Test-Driven Development, described AI agents as an "unpredictable genie." He had a specific complaint. The agents keep deleting tests in order to make them pass.

The agents don't fix the bugs. They delete the checks that would catch them.

I've been calling this The Confidence Gap. The distance between what AI output looks like and what it actually does. The interfaces look polished. The code compiles. The tests pass (if the agent hasn't quietly removed the ones that wouldn't.)

Everything appears to work right up until it doesn't.

Karpathy himself quietly demonstrated the limit. In October 2025, eight months after coining the term, he hand-coded his next serious project. AI agents, he said, "just didn't work well enough at all" on intellectually intense, non-boilerplate code.

Even the person who named it knew when to stop vibing.

What Jaws Knew About Two Frames

In 1974, Steven Spielberg was a 27-year-old director in deep trouble.

The three mechanical sharks he'd built for Jaws kept breaking down in saltwater. The shoot ran over schedule and over budget. Universal was considering killing the picture.

The footage landed on the editing bench of Verna Fields, a 56-year-old veteran with 37 films to her name. Younger directors called her "Mother Cutter." She cut films in a converted pool house in the San Fernando Valley.

Fields looked at the footage and made a decision that saved the movie. Don't show the shark.

Instead of forcing the malfunctioning prop into every scene, she reframed the film's terror around the creature's absence. She cut from underwater POV shots to frightened faces. She leaned on John Williams' two-note theme as a visual substitute. She let yellow barrels tracking across the surface stand in for the monster below.

Spielberg later described the precision this required. The shark would only look real in 36 frames, he said. Not 38. "That two-frame difference was the difference between something really scary and something that looked like a great white floating turd."

Two frames. At 24 frames per second, that is one-twelfth of a second.

The boundary between terror and comedy, measured in twelfths of a second. Fields' taste, her instinct for exactly when to cut away, made that call hundreds of times across the film.

She didn't shoot a single frame of the movie. She decided which frames the audience would see.

Fields won the Academy Award for Best Editing. Jaws became the first movie to gross $100 million. And one of its most famous moments, the Ben Gardner head scene, was filmed in Fields' backyard swimming pool with a rubber prop and some milk powder to cloud the water. The ocean location was wrapped and the budget was gone.

Taste doesn't need a big budget. It needs the ability to perceive the difference between 36 and 38.

Taste can also mean knowing when not to cut.

During the recording of U2's The Joshua Tree, producer Brian Eno grew so frustrated with "Where the Streets Have No Name" that he planned to erase the master tapes. That one song had consumed 40% of the album's studio time. Co-producer Daniel Lanois was using a blackboard to chart time signature changes. Eno called it "screwdriver work" and decided a fresh start would be better.

He cued up the tapes and prepared to record over them.

Engineer Pat McCarthy walked into the control room carrying a tray of tea, saw what was about to happen, dropped the tray, and physically restrained Eno.

The tapes survived. The song became one of rock's most iconic tracks.

Eno's instinct to destroy and rebuild was a taste judgment. McCarthy's instinct to preserve was also a taste judgment. The finished product came from the tension between them.

Taste isn't always subtraction. Sometimes it's knowing what to protect. And it works best when it's tested by other perspectives, not held in isolation.

Taste, Defined

In 1757, the philosopher David Hume wrote an essay called "Of the Standard of Taste." He was trying to answer a question that feels newly urgent: if taste is subjective, can there be any standard for it?

His answer was yes, with conditions. A "true judge," Hume argued, needs five qualities: delicacy, practice, comparison, freedom from prejudice, and good sense.

That reads less like 18th-century philosophy and more like a job description for someone reviewing AI-generated output in 2026.

Delicacy means noticing small wrongness. The two-frame difference. Practice means having scar tissue from enough failures that you've internalized patterns of what works and what doesn't. Comparison means knowing what "good" looks like because you've studied a range of work. Freedom from prejudice means evaluating what's in front of you, not what you expected or hoped for. Good sense means connecting aesthetic judgment to real-world consequences.

Ira Glass articulated a modern version of the same idea. Every creative person, he said, starts with a gap between their taste and their ability to execute. Your taste is what got you into the work. Your ability hasn't caught up yet.

AI closes the execution side of that gap. You can now produce things that look professional without years of practice. But it doesn't touch the taste side.

If anything, it widens the gap. You can generate a hundred variations in the time it used to take to produce one.

Without taste, you're just drowning in options.

Knowing which things to make. Evaluating whether those things work. Recognizing the specific moment when something crosses from good enough to right. Pattern recognition across failures. Comfort with ambiguity. The ability to explain why something feels wrong, not just that it does.

Those are the skills. They don't come from a prompt.

The Skill That Remains

When Kodak put a camera in everyone's hands in 1900 with the $1 Brownie, it sold 10 million units in five years. Photography went from exclusive craft to commodity overnight.

One of the most famous images of post-war Britain was shot by photojournalist Bert Hardy using that same cheap Brownie on a trip to Blackpool in 1951. The picture proved what the market would spend the next century confirming: when everyone can take a technically adequate photo, the differentiator is compositional judgment.

That pattern keeps repeating.

When GarageBand gave everyone a recording studio, the value of someone like Rick Rubin went up, not down. When digital publishing gave everyone a printing press, the writers who built audiences on Substack weren't the most prolific. They were the ones with a distinctive voice and editorial judgment. Substack went from 50,000 paid subscribers to 2 million in three years.

Music tells the same story at larger scale. Spotify receives 120,000 new tracks per day. Of all the music on the platform, 86% generates zero income.

Production got cheap. The value migrated to curation. Playlist editors became the gatekeepers.

The person who can evaluate is worth more than the person who can generate.

A paper published in January 2026 by researchers at Central European University and the Kiel Institute documented the starkest version of this split. Traffic to Tailwind CSS's documentation dropped 40% from early 2023, despite Tailwind being more popular than ever.

People are using it through AI intermediaries without visiting the docs. They're building with Tailwind. They're not learning Tailwind.

Usage up. Comprehension down.

This is the Rubin problem in code. The AI generates Tailwind classes, and the developer ships them without understanding why those specific utility classes produce the visual result they do. The code works, but the reasoning is absent.

The funding model that sustains the project erodes because AI has decoupled consumption from understanding.

When the AI mediates the usage, the people who actually know why something works become the scarce resource.

The Reducer's Question

Rubin was once asked what he wants people to feel about his productions.

"I want them to say, 'This is the best thing I've ever heard,'" he said, "and not know why."

That's what taste looks like from the outside. Invisible. The audience doesn't see the 200 rejected takes, the songs stripped down to voice and guitar, the decision to cut at frame 36 instead of 38. They just know something feels right.

The skills that matter most don't come from a prompt. They come from the accumulated weight of decisions made, patterns studied, and the slow, unglamorous work of learning to tell right from almost-right.

Anyone can generate. The scarce skill is knowing what’s right.

Newer PostNewer Post Older PostOlder Post