-
Notifications
You must be signed in to change notification settings - Fork 8
Investigate articles without QID #24
Copy link
Copy link
Open
Milestone
Description
The schema for the wikipedia enterprise dumps lists the QID field (main_entity) as optional.
All articles should have a QID, but apparently there are cases where they don't.
It's not just articles that are so minor they don't have a wikidata item. In the 20230801 dump for example, out of this sample of errors:
[2023-08-04T17:58:48Z INFO om_wikiparser] Page without wikidata qid: "Wiriadinata Airport" (https://en.wikipedia.org/wiki/Wiriadinata_Airport)
[2023-08-04T17:59:11Z INFO om_wikiparser] Page without wikidata qid: "Uptown (Brisbane)" (https://en.wikipedia.org/wiki/Uptown_(Brisbane))
Both articles were edited on 2023-07-31, around when the dump was created:
- https://en.wikipedia.org/w/index.php?title=Wiriadinata_Airport&action=history#mw-diff-1168065176
- https://en.wikipedia.org/w/index.php?title=Uptown,_Brisbane&action=history#mw-oldid-1168039440
Is this the main cause of these cases, or is there something else?
Is there some data we can preserve across dumps to prevent this, like keeping old qid links if there is no current one?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels