Saving dtypes as metadata for roundtrip consistency#262
Saving dtypes as metadata for roundtrip consistency#262yfukai wants to merge 13 commits intoroyerlab:mainfrom
Conversation
|
Notes:
|
|
Both tasks are done. I'll review the code for SQLgraph and mark this open for review. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #262 +/- ##
==========================================
- Coverage 87.92% 87.76% -0.16%
==========================================
Files 56 56
Lines 4471 4610 +139
Branches 789 810 +21
==========================================
+ Hits 3931 4046 +115
- Misses 344 360 +16
- Partials 196 204 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| ordered_keys = [key for key in preferred_order if key in schemas] | ||
| ordered_keys.extend(key for key in table_class.__table__.columns.keys() if key not in ordered_keys) | ||
| ordered_keys.extend(key for key in schemas if key not in ordered_keys) | ||
| return {key: schemas[key] for key in ordered_keys} |
There was a problem hiding this comment.
What do you think of this implementation? This way, we can avoid checking if it's in ordered_keys, which is a list, and it takes a bit longer to check if the column is there.
| ordered_keys = [key for key in preferred_order if key in schemas] | |
| ordered_keys.extend(key for key in table_class.__table__.columns.keys() if key not in ordered_keys) | |
| ordered_keys.extend(key for key in schemas if key not in ordered_keys) | |
| return {key: schemas[key] for key in ordered_keys} | |
| result = {} | |
| # return dictionary in preferred order | |
| for source in ( | |
| preferred_order, | |
| table_class.__table__.columns.keys(), | |
| schemas, | |
| ): | |
| for key in source: | |
| if key in schemas: | |
| result.setdefault(key, schemas[key]) | |
| return result |
| else: | ||
| nodes_df = nodes_df.select([pl.col(c) for c in self._node_attr_schemas() if c in nodes_df.columns]) |
There was a problem hiding this comment.
@yfukai , it's not clear to me why this is necessary.
Is it because it might have private attributes?
Could you include a comment on this and the equivalent edge_attr_code?
| edge_schemas = self.__edge_attr_schemas | ||
| # Process arguments and create validated schema | ||
| schema = process_attr_key_args(key_or_schema, dtype, default_value, self.__edge_attr_schemas) | ||
|
|
||
| # Store schema | ||
| self.__edge_attr_schemas[schema.key] = schema | ||
| schema = process_attr_key_args(key_or_schema, dtype, default_value, edge_schemas) | ||
|
|
||
| # Add column to database | ||
| self._add_new_column(self.Edge, schema) | ||
| edge_schemas[schema.key] = schema | ||
| self.__edge_attr_schemas = edge_schemas | ||
|
|
||
| def remove_edge_attr_key(self, key: str) -> None: | ||
| if key not in self.edge_attr_keys(): | ||
| raise ValueError(f"Edge attribute key {key} does not exist") | ||
|
|
||
| edge_schemas = self.__edge_attr_schemas | ||
| self._drop_column(self.Edge, key) | ||
| self.__edge_attr_schemas.pop(key, None) | ||
| edge_schemas.pop(key, None) | ||
| self.__edge_attr_schemas = edge_schemas |
There was a problem hiding this comment.
I was a bit lost at first here, but this seems required because self.__edge_attr_schemas has a setter and getter.
Do you think it would be worth being more explicit than my original implementation with @property and .setter? We could address this in another PR.
This is a follow-up on the dtype, assuming #260 has been merged.
In this PR, the graphs store the dtypes of the fields as metadata, enabling robust recovery of the original dtypes during round-trip
from_otherconversions.