Research

177 Major Media Sites Reveal: Blocking AI Crawlers Correlates With Lower Optimization

Analysis of 177 major media sites shows sites blocking GPTBot score 42/100 vs 47.9/100 for non-blockers. Blocking doesn't equal optimization.

D

David Lis

2026-01-31 · 6 min min read

Since ChatGPT launched web search, the media industry has been divided: block AI crawlers or embrace them? Our analysis of 177 major media sites, from The New York Times to small regional publishers, reveals a surprising pattern that challenges conventional wisdom.

The Blocker Paradox

Here's what the data shows:

Comparison of sites blocking GPTBot (42.0/100) vs sites allowing GPTBot (47.9/100)

Sites blocking GPTBot score 14% lower on AI readiness on average

32.8%
Sites Blocking GPTBot
58 out of 177 sites actively block AI crawlers
42.0/100
Average Score (Blockers)
Sites that block GPTBot score lower on average
47.9/100
Average Score (Non-Blockers)
Sites allowing AI crawlers score higher
Key Finding: Sites blocking AI crawlers score 14% lower on AI readiness than sites that allow them. This isn't causation, it's correlation revealing a deeper pattern.

Why Blockers Score Lower

The correlation isn't because blocking causes lower scores. Instead, it reveals two different strategic approaches:

  • Defensive blockers: Sites focused on preventing AI access without optimizing their structured data. They block bots but ignore schema.org markup, E-E-A-T signals, and content structure.
  • Strategic embracers: Sites that recognize AI search as a distribution channel and actively optimize for it. These sites implement Article schema, author markup, and proper content hierarchy.
  • The middle ground: Some sites (like Bloomberg, scoring 65/100) both block crawlers AND maintain excellent structured data proving you can do both.

Methodology: 177 Sites Analyzed

We analyzed major media sites across six categories:

CategorySitesAvg Score% Blocking GPTBot
National News3248.234.4%
Business & Finance3342.830.3%
Technology2846.828.6%
Sports2544.032.0%
Culture & Lifestyle3043.333.3%
Entertainment2945.534.5%

The Schema Crisis

Beyond blocking behavior, we found a shocking gap in basic optimization:

96%

of media sites lack Article schema markup

This mirrors our earlier finding that 98.3% of e-commerce sites lack Product schema. The pattern is clear: industries are blocking AI without doing the basic optimization work.

Top Performers: What They Do Differently

The highest-scoring media sites share common traits:

Bloomberg (65/100):
✓ Comprehensive Article schema
✓ Author markup with credentials
✓ Publishing/modified dates
✓ Clear content hierarchy
✗ Blocks GPTBot (strategic choice)

The Verge (62/100):
✓ NewsArticle schema on all articles
✓ Detailed author bios with links
✓ Image optimization with alt text
✓ Allows AI crawlers

Reuters (58/100):
✓ Strong E-E-A-T signals
✓ Timestamped content
✓ Topic categorization
✓ Allows AI crawlers

What This Means for Publishers

The data suggests three strategic options:

  1. Optimize then decide: Implement Article schema, author markup, and E-E-A-T signals first. Then make an informed decision about blocking based on your specific goals.
  2. Block strategically: If you block, do it intentionally (like Bloomberg) while maintaining excellent structured data. Don't let blocking become an excuse to ignore optimization.
  3. Embrace and optimize: Allow AI crawlers while actively optimizing for AI visibility. This maximizes your chances of appearing in AI-generated responses.
Bottom line: Blocking AI crawlers is a strategic choice but it shouldn't replace the fundamental work of structured data optimization.

Comparing Industries: Media vs E-commerce

Our previous analysis of 409 e-commerce sites provides interesting contrast:

MetricE-commerce (409 sites)Media (177 sites)
Average AI Score25.6/10045.0/100
Schema Adoption1.7% (Product)4.0% (Article)
GPTBot Blocking~15%*32.8%
Primary GapMissing Product markupMissing Article markup
Note
Note: *Estimated based on sample analysis of e-commerce sites

Media sites score 76% higher than e-commerce on average but both industries share the same fundamental problem: structured data neglect.

Next: Tourism Industry Analysis

We're currently analyzing 290+ tourism and hospitality sites (hotels, travel agencies, destinations). Early indicators suggest this industry may face the most severe AI visibility challenges, as OTAs (Booking.com, Expedia) dominate AI search results while independent properties remain invisible.

Subscribe to follow the research as we expand our benchmark database across industries.

Audit Your Site's AI Readiness

Want to know how your media site compares to these benchmarks? Our AI Audit Scanner analyzes:

  • Schema.org markup (Article, NewsArticle, Organization)
  • E-E-A-T signals (author credentials, publication dates)
  • Robots.txt configuration and AI crawler access
  • Content structure and hierarchy
  • Competitive positioning against industry averages
Scan Your Site Now

Get your comprehensive AI readiness report in seconds


Note
Methodology Note: Sites analyzed between January 27-29, 2026. Scores based on 15+ signals including schema markup, robots.txt configuration, E-E-A-T indicators, and content structure. Each site was crawled once; scores reflect snapshot analysis, not historical trends.

Ready to Check Your AI Visibility?

See how your e-commerce site compares to the 409 sites I analyzed. Get a detailed AI readiness report in seconds.