177 Major Media Sites Reveal: Blocking AI Crawlers Correlates With Lower Optimization
Analysis of 177 major media sites shows sites blocking GPTBot score 42/100 vs 47.9/100 for non-blockers. Blocking doesn't equal optimization.
David Lis
2026-01-31 · 6 min min read
Since ChatGPT launched web search, the media industry has been divided: block AI crawlers or embrace them? Our analysis of 177 major media sites, from The New York Times to small regional publishers, reveals a surprising pattern that challenges conventional wisdom.
The Blocker Paradox
Here's what the data shows:
Sites blocking GPTBot score 14% lower on AI readiness on average
Why Blockers Score Lower
The correlation isn't because blocking causes lower scores. Instead, it reveals two different strategic approaches:
- Defensive blockers: Sites focused on preventing AI access without optimizing their structured data. They block bots but ignore schema.org markup, E-E-A-T signals, and content structure.
- Strategic embracers: Sites that recognize AI search as a distribution channel and actively optimize for it. These sites implement Article schema, author markup, and proper content hierarchy.
- The middle ground: Some sites (like Bloomberg, scoring 65/100) both block crawlers AND maintain excellent structured data proving you can do both.
Methodology: 177 Sites Analyzed
We analyzed major media sites across six categories:
| Category | Sites | Avg Score | % Blocking GPTBot |
|---|---|---|---|
| National News | 32 | 48.2 | 34.4% |
| Business & Finance | 33 | 42.8 | 30.3% |
| Technology | 28 | 46.8 | 28.6% |
| Sports | 25 | 44.0 | 32.0% |
| Culture & Lifestyle | 30 | 43.3 | 33.3% |
| Entertainment | 29 | 45.5 | 34.5% |
The Schema Crisis
Beyond blocking behavior, we found a shocking gap in basic optimization:
of media sites lack Article schema markup
Only 4% of analyzed sites implement the NewsArticle or Article schema that helps AI systems understand content structure, authorship, and publishing dates.
This mirrors our earlier finding that 98.3% of e-commerce sites lack Product schema. The pattern is clear: industries are blocking AI without doing the basic optimization work.
Top Performers: What They Do Differently
The highest-scoring media sites share common traits:
Bloomberg (65/100):
✓ Comprehensive Article schema
✓ Author markup with credentials
✓ Publishing/modified dates
✓ Clear content hierarchy
✗ Blocks GPTBot (strategic choice)
The Verge (62/100):
✓ NewsArticle schema on all articles
✓ Detailed author bios with links
✓ Image optimization with alt text
✓ Allows AI crawlers
Reuters (58/100):
✓ Strong E-E-A-T signals
✓ Timestamped content
✓ Topic categorization
✓ Allows AI crawlersWhat This Means for Publishers
The data suggests three strategic options:
- Optimize then decide: Implement Article schema, author markup, and E-E-A-T signals first. Then make an informed decision about blocking based on your specific goals.
- Block strategically: If you block, do it intentionally (like Bloomberg) while maintaining excellent structured data. Don't let blocking become an excuse to ignore optimization.
- Embrace and optimize: Allow AI crawlers while actively optimizing for AI visibility. This maximizes your chances of appearing in AI-generated responses.
Comparing Industries: Media vs E-commerce
Our previous analysis of 409 e-commerce sites provides interesting contrast:
| Metric | E-commerce (409 sites) | Media (177 sites) |
|---|---|---|
| Average AI Score | 25.6/100 | 45.0/100 |
| Schema Adoption | 1.7% (Product) | 4.0% (Article) |
| GPTBot Blocking | ~15%* | 32.8% |
| Primary Gap | Missing Product markup | Missing Article markup |
Media sites score 76% higher than e-commerce on average but both industries share the same fundamental problem: structured data neglect.
Next: Tourism Industry Analysis
We're currently analyzing 290+ tourism and hospitality sites (hotels, travel agencies, destinations). Early indicators suggest this industry may face the most severe AI visibility challenges, as OTAs (Booking.com, Expedia) dominate AI search results while independent properties remain invisible.
Subscribe to follow the research as we expand our benchmark database across industries.
Audit Your Site's AI Readiness
Want to know how your media site compares to these benchmarks? Our AI Audit Scanner analyzes:
- Schema.org markup (Article, NewsArticle, Organization)
- E-E-A-T signals (author credentials, publication dates)
- Robots.txt configuration and AI crawler access
- Content structure and hierarchy
- Competitive positioning against industry averages
Get your comprehensive AI readiness report in seconds
Ready to Check Your AI Visibility?
See how your e-commerce site compares to the 409 sites I analyzed. Get a detailed AI readiness report in seconds.