Skip to content

Conversation

@rartino
Copy link
Contributor

@rartino rartino commented Nov 22, 2025

These are the property definitions for the trajectory endpoint. They compile, but I will do some further checking myself.

Note: this PR is based on top of the commits in #579, so that PR should be merged before this one. Hence I've marked this as a draft PR. If you look at the full diffs now, you see both PRs merged; which isn't a good way to review this PR.

@rartino rartino added the blocking-release This is a PR or issue that presently blocks the release of next version of the spec. label Nov 22, 2025
Copy link
Member

@merkys merkys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware this is a draft, but I left a pair of comments about the implementation. I noticed a lot of properties from structures are copied, which might add additional maintenance burden. Would not it be possible to re-declare all trajectories properties as lists of already existing structures properties:

# ...
$id: "https://schemas.optimade.org/defs/v1.3/properties/optimade/trajectories/structure_features"
# ...
type:
  - "array"
items:
  $$inherit: "/v1.2/properties/optimade/structures/structure_features"

@rartino
Copy link
Contributor Author

rartino commented Nov 24, 2025

Would not it be possible to re-declare all trajectories properties as lists of already existing structures properties

I thought that is what I did. My version of properties/optimade/trajectories/structure_features.yaml also contains the precise lines you suggest:

x-optimade-type: "list"
...
type:
  - "array"
...
items:
  $$inherit: "/v1.2/properties/optimade/structures/structure_features"

There are a few things that may be confusing here though.

First, since this PR already is based off #579 you see those changes in the change set until that PR is merged, which makes it look like this PR affects a lot of non-trajectory properties, but that is just how it looks now.

Second, all the added definitions of the fields under the trajectories endpoint repeat to a large degree the human-readable descriptions from the original properties, only wrapping them in some boilerplate for "a list of". This follows something we've done so far for other "nested" property definitions. Basically, it is nice if when you look up a property definition if you see a human-readable explanation that cover the complete field, so you don't have to "drill down" through a tree of nested definitions to get the complete picture. I realize that it may seem a bit heavy in this PR since we decided to "make every structure property a list property", but I expect this to be an exception and this is a principle we want to uphold in the general case.

@ml-evs
Copy link
Member

ml-evs commented Dec 7, 2025

I've just merged #579 so think this is now ready to be marked as "ready for review" -- will try out the schema checker job and review ASAP.

@ml-evs ml-evs marked this pull request as ready for review December 7, 2025 15:24
@ml-evs ml-evs requested review from merkys and ml-evs December 7, 2025 15:24
Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just merged this PR into #577 and unless I'm missing something the check is telling us that definitions for nframes and reference_frames are missing definitions here? All the other warnings in that check can be ignored.

@merkys
Copy link
Member

merkys commented Dec 8, 2025

@rartino

I thought that is what I did. My version of properties/optimade/trajectories/structure_features.yaml also contains the precise lines you suggest

Oh, I seem to have missed them. Thanks for pointing this out.

First, since this PR already is based off #579 you see those changes in the change set until that PR is merged, which makes it look like this PR affects a lot of non-trajectory properties, but that is just how it looks now.

Second, all the added definitions of the fields under the trajectories endpoint repeat to a large degree the human-readable descriptions from the original properties, only wrapping them in some boilerplate for "a list of". This follows something we've done so far for other "nested" property definitions. Basically, it is nice if when you look up a property definition if you see a human-readable explanation that cover the complete field, so you don't have to "drill down" through a tree of nested definitions to get the complete picture. I realize that it may seem a bit heavy in this PR since we decided to "make every structure property a list property", but I expect this to be an exception and this is a principle we want to uphold in the general case.

Right, this must have confused me. While I in principle agree that it is nice to mirror the descriptions of properties wrapped in arrays in trajectories, we risk the descriptions diverging. What is even more complicating, is that the descriptions are not plain copy-pastes, viz. the difference between schemas/src/defs/v1.2/properties/optimade/structures/space_group_it_number.yaml and schemas/src/defs/v1.3/properties/optimade/trajectories/space_group_it_number.yaml:

  Space group number for the structure assigned by the International Tables for Crystallography Vol. A.

  **Requirements/Conventions**:

  - The integer value MUST be between 1 and 230.
  - MUST be `null` if `nperiodic_dimensions` is not equal to 3.

vs.

  A list of space_group_it_number items.
  A space_group_it_number item is a space group number for the structure assigned by the International Tables for Crystallography Vol. A.

  For each space_group_it_number item the following applies:

  **Requirements/Conventions**:

  - The integer value MUST be between 1 and 230.
  - MUST be `null` if `nperiodic_dimensions` is not equal to 3.

Strictly speaking, the last line here might need an adjustment to read "MUST be null if correlated value of nperiodic_dimensions is not equal to 3" (nperiodic_dimensions for trajectories is a list).

The point I am trying to make here is that these property definitions might be difficult to both read and maintain. I think it would suffice to say "A list of items equivalent to space_group_it_number for structures" or something along the lines.

@rartino
Copy link
Contributor Author

rartino commented Dec 8, 2025

@merkys

I think it is part of the point that in the context of the lists of frames, the meaning of the items are not quite "equivalent" to the structure entries, e.g., in how the descriptions refer to other fields.

Strictly speaking, the last line here might need an adjustment to read "MUST be null if correlated value of nperiodic_dimensions is not equal to 3" (nperiodic_dimensions for trajectories is a list).

This could indeed be clarified. But, I do not see how this issue goes away by deferring the definition to structures/space_group_it_number; it seems to make it worse? We would still have to explain somewhere that the nperiodic_dimensions mentioned in the deferred definition now refers to a value in a possibly present parallel list, right?

I'm worried that this moves us away from helpful human-readable descriptions "just tell me what this field is?" into less friendly and more programmatic descriptions: "trajectory field A is a list of things whose definitions you lookup at X and then re-interpret according to Y".

The point I am trying to make here is that these property definitions might be difficult to both read and maintain. I think it would suffice to say "A list of items equivalent to space_group_it_number for structures" or something along the lines.

I disagree about the "read" part, since I think it is more helpful to have fields defined in their entirety. I agree that the overlap in texts is an extra burden to maintain. However, if we update a definition under structures, we will anyway have to edit the corresponding definition under trajectories to point it to the new definition under structures. When doing that, it would not be the worst thing to have to read trough the trajectory version of the text and decide what those changes mean in the context of parallel lists. I'd suggest "explicit is better than implicit" here.

Also, hopefully we're not going to edit our structure fields particularly frequently.

I'm also concerned that the "fractured descriptions" we would end up with won't be crystal clear on which e.g., structures/space_group_it_number we refer to when/if there are multiple versions. The correct answer is that in the json property definition, we compile in the sub-definitions, so the user is in that case meant to drill down into the json definition file for the correct sub-description. But for this to be user-friendly when we show a text we would probably have to give the IRI in the description text.

@rartino
Copy link
Contributor Author

rartino commented Dec 8, 2025

@ml-evs Good catch! I've added the missing ones. Can you re-update #577 with this PR and re-check?

Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All working in #577 now -- thanks @rartino.

I've given the definitions a once over and can't see any issues by eye. If @merkys is happy I suggest we merge and release this!

@rartino
Copy link
Contributor Author

rartino commented Dec 9, 2025

@merkys Do we agree that rewriting these description fields without changing anything else is a patch-level upgrade of a property definition? In that case, perhaps it can be ok to accept the definitions as is to get the release out? We can then discuss the pro:s and con:s of these texts on the next web meet and merge patch upgrades as needed?

@merkys
Copy link
Member

merkys commented Dec 9, 2025

@rartino

I think it is part of the point that in the context of the lists of frames, the meaning of the items are not quite "equivalent" to the structure entries, e.g., in how the descriptions refer to other fields.

I agree, but I do not think that the current formulation of trajectories' property descriptions highlights this point.

Strictly speaking, the last line here might need an adjustment to read "MUST be null if correlated value of nperiodic_dimensions is not equal to 3" (nperiodic_dimensions for trajectories is a list).

This could indeed be clarified. But, I do not see how this issue goes away by deferring the definition to structures/space_group_it_number; it seems to make it worse? We would still have to explain somewhere that the nperiodic_dimensions mentioned in the deferred definition now refers to a value in a possibly present parallel list, right?

I'm worried that this moves us away from helpful human-readable descriptions "just tell me what this field is?" into less friendly and more programmatic descriptions: "trajectory field A is a list of things whose definitions you lookup at X and then re-interpret according to Y".

I see your point in the benefit of knowing what a field means by looking in its description. However, even the main specification text does not attempt to provide the precise descriptions of each supported property of trajectories. Instead, it says:

all custom properties defined in the Structures Entries_ endpoint are also used for trajectories, with the following difference: each property is extended by wrapping it in a list, so that each custom property of a :entry:structures resource becomes a list with an additional first dimension of size nframes_ (with dimension name dim_frames, as defined in the property definition). This allows these properties to be defined for each frame, and thus possibly change during the trajectory.

My suggestion to use pointers to property descriptions in structures is not something different from what is being said here.

I disagree about the "read" part, since I think it is more helpful to have fields defined in their entirety. I agree that the overlap in texts is an extra burden to maintain. However, if we update a definition under structures, we will anyway have to edit the corresponding definition under trajectories to point it to the new definition under structures. When doing that, it would not be the worst thing to have to read trough the trajectory version of the text and decide what those changes mean in the context of parallel lists. I'd suggest "explicit is better than implicit" here.

I agree that it is beneficial to have fields defined in their entirety. However, in my opinion, current proposal is suboptimal in a sense that the reader will have to mentally convert the occurrences of "structure" to "trajectory step" and accommodate the parallel lists which are used to describe the trajectory.

Also, hopefully we're not going to edit our structure fields particularly frequently.

Sure. Nevertheless, even introduction of new properties may touch others (e.g. #539).

I'm also concerned that the "fractured descriptions" we would end up with won't be crystal clear on which e.g., structures/space_group_it_number we refer to when/if there are multiple versions. The correct answer is that in the json property definition, we compile in the sub-definitions, so the user is in that case meant to drill down into the json definition file for the correct sub-description. But for this to be user-friendly when we show a text we would probably have to give the IRI in the description text.

The solution would be to provide the definition version number, the same which is given in URI in items.$$inherit.

Copy link
Member

@merkys merkys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the issues I see with having trajectories' property descriptions echoing the ones from structures', I approve this PR. I do not think this PR (and the release) should be blocked.

@rartino rartino merged commit 81d6b65 into Materials-Consortia:develop Dec 10, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocking-release This is a PR or issue that presently blocks the release of next version of the spec.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants