-
Notifications
You must be signed in to change notification settings - Fork 2
Implement gatherling.com scraper #15
Description
This has been suggested by @Aliquanto3 as a well to get Premodern data into our dataset, and bakert commented on Discord that we could also use this for Penny Dreadful.
There's two paths to go here.
Plan A: Database dump. According to bakert they generate a database dump (with some things redatacted) every 24h here:
https://pennydreadfulmagic.com/static/dev-db.sql.gz
This is a MariaDB data dump, so the process should be fairly simple:
- Automate the creation of a docker container with MariaDB importing this script
- Extract data from the DB dump
We could also explore using embedded MariaDB which would facilitate a few things, but it doesn't look like they provide Windows builds for that.
Plan B: Scraping. There's an eventinfo route that seems to contain all the info we need for an individual tournament.
https://gatherling.com/api.php?action=eventinfo&event=Pre-Modern%20Monthly%20League%2011.05
The only thing missing is a way to list older events. There's an event list page here, but it doesn't match the way the scraper works very well since there's no way to navigate by date.
https://gatherling.com/eventreport.php
There's no documentation for the API but the code is available on Github, so we can also explore if there's other routes that could help:
https://github.com/PennyDreadfulMTG/gatherling/blob/dev/gatherling/api.php