Formatter / schema.org / Add croissant spec 🥐#8939
Conversation
"Croissant 🥐 is a high-level format for machine learning datasets that combines metadata, resource file descriptions, data structure, and default ML semantics into a single file; it works with existing datasets to make them easier to find, use, and support with tools. Croissant builds on schema.org, and its Dataset vocabulary, a widely used format to represent datasets on the Web, and make them searchable. " Croissant is extending schema.org, this improvement review the current schema.org formatter to support additional 🥐 metadata available in ISO format. This is mainly about adding: * croissant fileObject based on online resources with a download protocol * croissant recordSet based on the feature catalogue Refactor JSON-LD formatter for using same base formatter for both ISO19139 and ISO19115-3 to facilitate maintenance (similar to citation and DCAT formatter).
Formatter producing JSON may produce invalid document as XSLT process output text which is written in the response. Ensure the JSON is valid and format it. Log any error, to be able to monitor them and improve the formatter for not well managed encoding. In future version, consider using XSLT3 which support JSON output (https://www.w3.org/TR/xslt-30/#json).
4a7ea6c to
4c0d87d
Compare
…e easy to add as dependency.
4c0d87d to
ab61c02
Compare
|
* Add basic dublin core support * Improve JSON encoding
|
jodygarnett
left a comment
There was a problem hiding this comment.
Please include docs in PR.
Updated the search engines section with details on robots.txt and sitemap usage, including examples for better indexing.
Additional details about SEO & schema.org added. See https://github.com/geonetwork/core-geonetwork/blob/44-croissant/docs/manual/docs/tutorials/introduction/extra/index.md#search-engines In GN5, maybe at some point we should create dedicated section to each formatters to explain the context and usages. |
|
@fxprunayre thanks. One comment:
Note my mention of published (and added link for JSON-LD). Of course I could be wrong, but, this is what I have found. Feel free to correct me here :) |
|
I might even expand that (as not everyone knows what "schema.org" is) to :"GeoNetwork includes a representation of that record in the schema.org (Structured Data for the Web) framework, encoded as JSON-LD, on any HTML representation of a published metadata record." |
|
And further expanded (as the JSON-LD is only embedded on API-rendered HTML pages, not "any" record's HTML page): "GeoNetwork includes a representation of that record in the schema.org (Structured Data for the Web) framework, encoded as JSON-LD, on any HTML representation (through the GeoNetwork API / sitemap) of a published metadata record." |
|
Out of interest, is this still on track @fxprunayre ? |





"Croissant 🥐 is a high-level format for machine learning datasets that combines metadata, resource file descriptions, data structure, and default ML semantics into a single file; it works with existing datasets to make them easier to find, use, and support with tools. Croissant builds on schema.org, and its Dataset vocabulary, a widely used format to represent datasets on the Web, and make them searchable." https://docs.mlcommons.org/croissant/
Croissant is extending schema.org, this improvement review the current schema.org formatter to support additional 🥐 metadata available in ISO format. This is mainly about adding:
Refactor JSON-LD formatter for using same base formatter for both ISO19139 and ISO19115-3 to facilitate maintenance (similar to citation and DCAT formatter).
Improve formatter producing JSON output by ensuring the output is JSON valid, format it and log any error in order to be able to track errors and improve not well managed encoding.
schema.org improvement:
inLanguagecorrespond to the resource language, not the metadata language.producer(eg. provider, producer, copyrightHolder, publisher, author, funder)temporalCoverage) if no corresponding element in input documentSimilar initiatives:
Checklist
mainbranch, backports managed with labelREADME.mdfilespom.xmldependency management. Update build documentation with intended library use and library tutorials or documentationFunded by BRGM & Ifremer.