What is OpenAI's GPTBot crawling on our Wikibase instance?
| Category | Requests | % | Data | Details |
|---|---|---|---|---|
| Item pages | 371,006 | 54.8% | 4.5 GB | Viewing/history of Item:Q* pages |
| Special:Log | 129,374 | 19.1% | 1.7 GB | Browsing site activity logs |
| Special:WhatLinksHere | 74,741 | 11.0% | 477 MB | Finding links to items/properties |
| Special:EntityData | 52,453 | 7.7% | 68 MB | Structured data exports (JSON, RDF, etc.) |
| Property pages | 9,402 | 1.4% | 54 MB | Viewing Property:P* definitions |
| Item talk pages | 9,052 | 1.3% | 18 MB | Discussion pages |
| Special:UserLogin | 7,954 | 1.2% | 32 MB | Login page redirects |
| Entity redirects | 6,816 | 1.0% | 1.7 MB | /entity/Q* linked data URLs |
| Other | 16,602 | 2.5% | 110 MB | RecentChanges, NewItem, etc. |
When fetching structured entity data, GPTBot requests multiple formats:
| Format | Requests | Description |
|---|---|---|
.n3 |
11,742 | Notation3 (RDF) |
.rdf |
8,929 | RDF/XML |
.json |
8,726 | JSON format |
.jsonld |
7,424 | JSON-LD (linked data) |
.nt |
7,068 | N-Triples |
.ttl |
1,188 | Turtle (RDF) |
| (default) | 6,862 | No extension specified |
| Status | Count | % | Meaning |
|---|---|---|---|
| 200 | 566,075 | 83.6% | Success |
| 301 | 74,511 | 11.0% | Permanent redirect |
| 404 | 24,417 | 3.6% | Not found |
| 303 | 6,851 | 1.0% | See other (content negotiation) |
| 302 | 4,618 | 0.7% | Temporary redirect |
| 5xx | 927 | 0.1% | Server errors |
Request volume by hour of day:
Peak activity at 4pm (16:00) with 36,744 requests. Relatively consistent throughout the day with a dip around 2-3am and 6-8pm.
Analysis based on 677,400 requests from GPTBot/1.2 over a 14-day period. Data extracted from nginx access logs.