World’s First LLM Benchmark for Creativity Finds AI Tools Are More Similar Than You Think

October 21, 2025
Media

The 4As, IAA and more leading industry groups teamed up with Springboards to compare how top AI tools like ChatGPT, Gemini and Claude perform on creative tasks

New York, NY – October 21, 2025 – A comprehensive new study by Springboards, an AI platform inspiring creativity in advertising, found that popular AI tools like ChatGPT, Gemini, Claude and others perform much more similarly on creative tasks than many people think. Creativity Benchmark, conducted in collaboration with the 4As, ACA, APG, D&AD, IAA, IPA, and The One Club for Creativity, challenges the idea that there's a single "best" AI tool for creative work and shows agencies need more efficient ways to test AI tools for their specific needs.

Sixteen different AI systems – from OpenAI, Google, Anthropic, Meta, DeepSeek, Alibaba and others – were tested on real marketing challenges across 100 notable brands. Over 600 creative professionals from ad agencies, marketing teams, and strategy firms made over 11,000 comparisons to see which ones worked best. The biggest surprise? There was no clear winner. The differences between the "best" and "worst" AI tools were much smaller than expected.

"Everyone assumes some AI tools are way better than others for creative work," said Pip Bingemann, CEO and co-founder of Springboards. "But our tests showed the results were pretty close. Why? Because these models are machines designed to recognize patterns and give you the most probable answer—and 'probable' has never been called 'creative.' Keeping humans in the loop and optimizing for a wider range of varied ideas is crucial.”

The study looked at three types of creative challenges: finding surprising insights about consumers, creating big campaign ideas, and coming up with bold, attention-grabbing concepts.

Key Findings:

  • Different AI Tools Win at Different Tasks: No single AI system was best at everything. Some were better at strategic thinking, others at wild, creative ideas. This means agencies might want to use different tools for different jobs.
  • Variety of Ideas Matters Most: Some AI tools generated lots of different creative options for the same brief. Others kept suggesting similar ideas over and over. For real creative work, having many different options is just as important as having good ones.
  • AI Can't Judge Creative Work Well: When researchers had AI systems evaluate creative ideas, they gave very different scores than human experts. This means agencies can't rely on AI to pick the best creative concepts – they still need human judgment.
  • Standard Creativity Tests Don't Work for Marketing: Traditional creativity tests used in psychology don't predict which AI will be better at marketing-specific creative tasks. Brand work requires its own way of measuring creativity.
  • Creative Preferences Vary by Location: Interestingly, creative professionals in different countries preferred different AI tools, suggesting that cultural differences affect what people consider good creative work.

“LLMs aren’t a one-size-fits-all solution—they're general purpose tools that require human creativity to unlock breakthrough outcomes," said Jeremy Lockhorn, SVP, Creative Technologies & Innovation, 4As. "These findings suggest agencies and brands should continue to evaluate which models are best suited for creative work - and that a multi-model approach may well be the best path forward."

“This study highlights that creativity isn’t about which AI you use, it’s about how you use it,” remarked Tony Hale, CEO, Advertising Council Australia. “The results reinforce what we see across the industry: the human spark remains essential to transforming good ideas into great ones. For agencies, the real opportunity is learning how to collaborate with these systems to expand, not replace, creative thinking.”

Methodology

The study involved 678 advertising professionals of diverse backgrounds, who participated in blind A/B idea judgments, likened to a "Tinder for Ideas." The data, collected over four weeks starting June 10, 2025, comprised 11,012 human comparisons across various brands, prompts, and models. This was analyzed using Bradley-Terry modeling and cosine distance for diversity scoring.

The research used four different ways to test AI creativity:

  • Real Creative Professionals Made the Calls: Nearly 700 people working in advertising, marketing, and strategy compared AI-generated ideas side-by-side. They didn't know which AI created which idea, so they couldn't play favorites. The study covered ideas for 100 major brands across 12 different business categories.
  • Tested How Many Different Ideas AI Can Create: Researchers asked each AI system to create 10 different responses to the same creative brief, then measured how different those responses were from each other. Some AI tools generated very similar ideas every time, while others came up with lots of variety.
  • Checked If AI Can Judge Its Own Work: The team had three leading AI systems evaluate the same creative ideas that humans had already scored, to see if AI judges agreed with human experts. They didn't.
  • Tried Standard Creativity Tests: The AI systems took adapted versions of creativity tests that psychologists use on humans, measuring things like how many ideas they generate and how original those ideas are.

All tests used the same settings and compared current AI systems from companies like OpenAI, Google, Anthropic, and Meta.

To access the full research white paper, visit https://arxiv.org/abs/2509.09702

If you'd like to learn more about the results, visit this page. To access the original research, visit creativitybenchmark.ai

About Springboards

Springboards is an AI-powered platform built to inspire creativity in advertising. The platform empowers teams to explore more ideas, without sacrificing the craft of great work. Founded by industry veterans Pip Bingemann, Amy Tucker, and Kieran Browne, Springboards has already partnered with 150+ agencies globally and secured $3 million USD in seed funding from Blackbird Ventures. For more information, visit Springboards or contact hello@springboards.ai.

We've got more like this

View All
Thoughts

Fast Five with Reuben Halper

Fast Five is our rapid-fire interview series, capturing quick takes from the industry on creativity and AI. 5 questions, 5 minutes, unfiltered.

Read
Appearances

Insights From Women Driving AI Startups at Cannes Lions

Most AI panels are a buzzword drinking game. This one wasn’t. At Salon Culture Conversations in Cannes, four women from AI startups got real about what it means to build with intention and why the best AI doesn’t replace people, it empowers them.

Read
Thoughts

Fast Five with James Hurman

Fast Five is our rapid-fire interview series, capturing quick takes from the industry on creativity and AI. 5 questions, 5 minutes, unfiltered.

Read

Let’s Break the Boring

Step right up,
request a demo

Thanks - you're in the queue!
We’ll be in-touch soon.
Oops! Something went wrong while submitting the form.