These publications are important documents of the public record and should be archived for future analysis.
For each item published in the BHP's feed of posts at https://www.bhp.com/media-and-insights/reports-and-presentations/, this scraper collects:
- title as
name - web address as
url - date and time it was collected in UTC, as
scraped_at - date and time published in UTC, as
published - main body html as
content - description as
summary - another place where this article is available, archive.org for example, as
syndication - the name of the organisation publishing as
org
These attribute names are loosely based on the Microformat
h-entry and h-card for org.
This scraper runs on the magnificent morph.io.