Skip to content

Conversation

@ml-evs
Copy link
Member

@ml-evs ml-evs commented Nov 7, 2025

Closes #575

I've also tweaked some of the rst roles in the spec to maybe allow us to scrape the spec for all fields (in a2cb422). It might be worth adding some special role to the headers that define fields to make this even easier, otherwise I'll concoct some grep incantation to try and do the same job without needing any changes to the spec.

@ml-evs ml-evs added blocking-release This is a PR or issue that presently blocks the release of next version of the spec. topic/infrastructure Issues/PRs relating to deployment and validation of the specification labels Nov 7, 2025
@ml-evs ml-evs force-pushed the ml-evs/add-schema-checker branch 2 times, most recently from 0baf57e to a2cb422 Compare November 7, 2025 16:59
@ml-evs ml-evs mentioned this pull request Nov 12, 2025
9 tasks
@ml-evs ml-evs marked this pull request as ready for review November 16, 2025 23:53
@ml-evs
Copy link
Member Author

ml-evs commented Nov 16, 2025

This PR now has a checker for:

  • building an validating existing schemas in CI
  • scraping the spec for property names and checking they have schemas (to be replaced eventually when we generate the spec from the schemas themselves) -- currently failing as we await trajectories fields.

@ml-evs ml-evs requested review from merkys and vaitkus November 16, 2025 23:54
Copy link
Member

@merkys merkys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for developing this @ml-evs! This PR adds the much-needed automatic validation of the specification. However, I believe, there are certain cases when it might run into false-positives and false-negatives. Moreover, the code is quite tightly coupled with v1.2 and v1.3, thus it would need updating for each subsequent versions.

I think the optimade-property-tools might be reused for such tasks as entry type/property lookup inside the property definitions. If not, I have recently written some code for the COD implementation and could reuse it to expose simple executables for such tasks.

In any case, I am OK with merging the PR as it is, as it would help us in developing the v1.3.0 specification and schemas. However, I would keep a mental note to revisit the code later.

check_property() {
local entry_type=$1
local property=$2
local version=${3:-v1.2} # Default to v1.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard-coding, maybe we can extract version numbers (here and below for v13_path) from optimade.rst or CHANGELOG.md? Otherwise we may forget bumping them.

local version=${3:-v1.2} # Default to v1.2

# Check in the main entry type directory first
local primary_path="schemas/src/defs/$version/properties/optimade/$entry_type/$property.yaml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proper way to check for the existence of a property is:

  1. Take the entry types and their URLs from schemas/src/defs/$version/standards/optimade.yaml
  2. Consult the entry type URLs (schemas/src/defs/$DEFINITION_VERSION/entrytypes/optimade/$ENTRY_TYPE.yaml) for properties

Important point here is that $version and $DEFINITION_VERSION might not be the same (see schemas/src/defs/v1.3/standards/optimade.yaml for example). I have recently implemented this for COD, maybe I could backport the code here. But maybe we want to do something really simple like find schemas/src/defs/*/$property.yaml instead?

else
# Check if it exists in any entry type
found=false
for et in structures files references calculations; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This again is hard-coded, will need updating for every new entry type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but its chicken-and-egg whether we add it here or wait for a folder name to be made with the relevant schemas. At least we could move it to the top of the file so its clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked again and this is unnecessarily hardcoded, as it's already extracted from the spec further up. Might be a bit trickier to extract versions but I'll fix this at least.

@ml-evs ml-evs force-pushed the ml-evs/add-schema-checker branch from 0c905db to ad21fe3 Compare November 17, 2025 14:02
@rartino rartino removed the blocking-release This is a PR or issue that presently blocks the release of next version of the spec. label Dec 5, 2025
ml-evs and others added 8 commits December 7, 2025 15:23
This commit adds scripts to verify that all properties defined in the specification (optimade.rst) have corresponding schema definition files.

- extract_entry_fields.sh: Extracts field names from the Entry List section of optimade.rst, looking for properties with ~~~~ underlines
- check_property_definitions.sh: Compares extracted fields against schema definitions in schemas/src/defs/, reporting missing definitions and orphaned schema files
- Updated Makelocal-audit to include audit_property_definitions target in the main audit suite

The checker validates that:
1. All properties with ~~~~ formatting have corresponding YAML schema files
2. Common properties exist in the core/ directory
3. Entry-specific properties exist in the appropriate optimade/<entry_type>/ directories

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The scripts now continue checking all properties and report all errors at the end, rather than exiting on the first error.
@ml-evs ml-evs force-pushed the ml-evs/add-schema-checker branch from 5560ee1 to e8b3f8b Compare December 7, 2025 15:23
@ml-evs
Copy link
Member Author

ml-evs commented Dec 7, 2025

After the meeting we decided to remove the blocking-release label from this PR -- I'm wondering if instead I should merge in the "uncontroversial" part which is just calling the make schemas tool in our CI, and leave the other "checking for fields in the spec that don't have definitions" part for a future PR.

@ml-evs ml-evs mentioned this pull request Dec 7, 2025
@ml-evs ml-evs marked this pull request as draft December 7, 2025 15:44
@merkys
Copy link
Member

merkys commented Dec 8, 2025

After the meeting we decided to remove the blocking-release label from this PR -- I'm wondering if instead I should merge in the "uncontroversial" part which is just calling the make schemas tool in our CI, and leave the other "checking for fields in the spec that don't have definitions" part for a future PR.

Splitting off uncontroversial make schemas part sounds like a good idea to me!

@ml-evs ml-evs changed the title Add schema checker job to CI Add extra schema checker jobs to CI Dec 8, 2025
@ml-evs ml-evs added the status/delay There is a consensus on this, but shouldn't be included at this point (possibly deferred beyond 1.0) label Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/delay There is a consensus on this, but shouldn't be included at this point (possibly deferred beyond 1.0) topic/infrastructure Issues/PRs relating to deployment and validation of the specification

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate machine-readable property definitions in CI of this repo

4 participants