GPTBot URL Analysis

What is OpenAI's GPTBot crawling on our Wikibase instance?

Total Requests
677,400
Over 14 days
Data Transferred
7.1 GB
~10.4 KB avg per request
Unique Items
10,913
Distinct Q-numbers accessed
Success Rate
83.6%
566,075 successful requests

URL Category Breakdown

Items
Logs
Links
Data
Item pages (54.8%)
Special:Log (19.1%)
WhatLinksHere (11.0%)
EntityData (7.7%)
Other (7.4%)
Category Requests % Data Details
Item pages 371,006 54.8% 4.5 GB Viewing/history of Item:Q* pages
Special:Log 129,374 19.1% 1.7 GB Browsing site activity logs
Special:WhatLinksHere 74,741 11.0% 477 MB Finding links to items/properties
Special:EntityData 52,453 7.7% 68 MB Structured data exports (JSON, RDF, etc.)
Property pages 9,402 1.4% 54 MB Viewing Property:P* definitions
Item talk pages 9,052 1.3% 18 MB Discussion pages
Special:UserLogin 7,954 1.2% 32 MB Login page redirects
Entity redirects 6,816 1.0% 1.7 MB /entity/Q* linked data URLs
Other 16,602 2.5% 110 MB RecentChanges, NewItem, etc.
Key Finding
GPTBot is primarily interested in item history and relationships. Over 85% of requests are for viewing items, checking what links to them, or browsing activity logs. It's exploring the knowledge graph structure rather than just fetching raw entity data.

EntityData Format Breakdown

When fetching structured entity data, GPTBot requests multiple formats:

Format Requests Description
.n3 11,742 Notation3 (RDF)
.rdf 8,929 RDF/XML
.json 8,726 JSON format
.jsonld 7,424 JSON-LD (linked data)
.nt 7,068 N-Triples
.ttl 1,188 Turtle (RDF)
(default) 6,862 No extension specified

HTTP Status Codes

Status Count % Meaning
200 566,075 83.6% Success
301 74,511 11.0% Permanent redirect
404 24,417 3.6% Not found
303 6,851 1.0% See other (content negotiation)
302 4,618 0.7% Temporary redirect
5xx 927 0.1% Server errors

Hourly Distribution (UTC-4)

Request volume by hour of day:

012345 67891011 121314151617 181920212223

Peak activity at 4pm (16:00) with 36,744 requests. Relatively consistent throughout the day with a dip around 2-3am and 6-8pm.

Sample URLs by Category

Item Pages (viewing history)

Special:Log (activity logs)

Special:WhatLinksHere (backlinks)

Special:EntityData (structured data)

Crawling Pattern
GPTBot appears to be systematically exploring the Wikibase knowledge graph by:
  1. Fetching item pages and their edit histories
  2. Following "What Links Here" to discover related items
  3. Checking activity logs to find recently changed items
  4. Downloading entity data in multiple RDF formats

Analysis based on 677,400 requests from GPTBot/1.2 over a 14-day period. Data extracted from nginx access logs.