Allow dumping models first inference input/output data to pickle files#4027
Conversation
|
Summon people who requested this to review |
There was a problem hiding this comment.
Pull Request Overview
This PR adds functionality to dump the first inference input/output data to pickle files for model validation with minimal datasets. This capability is specifically requested by the NPU development team to facilitate model debugging and validation.
Key changes:
- Added
_dump_first_infer_dataparameter support across launcher and evaluator components to enable first-inference data capture - Implemented data dumping logic that stores input/output tensors along with associated metadata (annotations, identifiers) to pickle files
- Refined error message formatting in the presenters module to improve readability and consistency
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/accuracy_checker/accuracy_checker/presenters.py | Improved error message formatting for metric comparisons with cleaner structure and conditional handling |
| tools/accuracy_checker/accuracy_checker/launcher/openvino_launcher.py | Added first inference data dumping capability with InferData namedtuple and automatic async mode disable |
| tools/accuracy_checker/accuracy_checker/launcher/loaders/loader.py | Added InferDataBatch namedtuple to store annotations, identifiers, and metadata |
| tools/accuracy_checker/accuracy_checker/launcher/loaders/init.py | Exported InferDataBatch for use in other modules |
| tools/accuracy_checker/accuracy_checker/evaluators/model_evaluator.py | Implemented first inference data storage with dataset preprocessing |
| tools/accuracy_checker/accuracy_checker/evaluators/custom_evaluators/base_models.py | Added inference data dumping to the base model class |
| tools/accuracy_checker/accuracy_checker/evaluators/custom_evaluators/base_custom_evaluator.py | Added inference data dumping support for custom evaluators |
| tools/accuracy_checker/accuracy_checker/argparser.py | Added CLI argument for specifying inference data dump path |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tools/accuracy_checker/accuracy_checker/evaluators/custom_evaluators/base_models.py
Outdated
Show resolved
Hide resolved
| friendly_outputs = {out_node.get_node().friendly_name: out_res for out_node, out_res in outputs.items()} | ||
| self._dump_first_inference(InferData(feed_dict, friendly_outputs)) | ||
| results.append(friendly_outputs) |
There was a problem hiding this comment.
The dictionary comprehension for friendly_outputs is computed even when inference data dumping is disabled. Since _dump_first_inference returns early when _dump_first_infer_data is None, this creates unnecessary overhead for every inference. Consider moving the dictionary comprehension inside the dump method or checking the flag before creating it.
| friendly_outputs = {out_node.get_node().friendly_name: out_res for out_node, out_res in outputs.items()} | |
| self._dump_first_inference(InferData(feed_dict, friendly_outputs)) | |
| results.append(friendly_outputs) | |
| # Only compute friendly_outputs if dumping is enabled or for results | |
| if self._dump_first_infer_data is not None: | |
| friendly_outputs = {out_node.get_node().friendly_name: out_res for out_node, out_res in outputs.items()} | |
| self._dump_first_inference(InferData(feed_dict, friendly_outputs)) | |
| results.append(friendly_outputs) | |
| else: | |
| friendly_outputs = {out_node.get_node().friendly_name: out_res for out_node, out_res in outputs.items()} | |
| results.append(friendly_outputs) |
| self.reset_memory_state = self.get_value_from_config('reset_memory_state') | ||
| self._dump_first_infer_data = self.config.get('_dump_first_infer_data', None) | ||
| if self._dump_first_infer_data is not None: | ||
| self.async_mode = False |
There was a problem hiding this comment.
Silently disabling async mode when dumping is enabled could cause unexpected performance differences in testing. Consider logging a warning message to inform users that async mode has been disabled due to inference data dumping.
| self.async_mode = False | |
| self.async_mode = False | |
| warning( | |
| "Async mode has been disabled because inference data dumping is enabled " | |
| "(_dump_first_infer_data is set). This may affect performance results." | |
| ) |
|
|
||
| @staticmethod | ||
| def store_predictions(stored_predictions, predictions): | ||
| def store_predictions(stored_predictions, predictions, mode='ab'): |
There was a problem hiding this comment.
The mode parameter changes the method's behavior significantly ('ab' for appending vs 'wb' for overwriting), but this is not validated. Adding 'wb' mode creates a risk of data loss if used incorrectly. Consider creating a separate method for the write-once use case or validating the mode parameter.
| def store_predictions(stored_predictions, predictions, mode='ab'): | |
| def store_predictions(stored_predictions, predictions, mode='ab'): | |
| if mode not in ('ab', 'wb'): | |
| raise ValueError(f"Invalid mode '{mode}' for store_predictions. Only 'ab' (append) and 'wb' (overwrite) are allowed.") |
| message = "{} {}".format(message, result_message) | ||
| elif abs_threshold <= diff_with_ref[0] or rel_threshold <= diff_with_ref[1]: | ||
| fail_message = "[FAILED: {}]".format(error_text) | ||
| elif abs_threshold <= diff_with_ref[0] or (rel_threshold and rel_threshold <= diff_with_ref[1]): |
There was a problem hiding this comment.
The condition abs_threshold <= diff_with_ref[0] will fail when abs_threshold is None due to the comparison with a numeric value. This should check abs_threshold is not None and abs_threshold <= diff_with_ref[0] to avoid TypeError.
| elif abs_threshold <= diff_with_ref[0] or (rel_threshold and rel_threshold <= diff_with_ref[1]): | |
| elif (abs_threshold is not None and abs_threshold <= diff_with_ref[0]) or (rel_threshold and rel_threshold <= diff_with_ref[1]): |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| dump_inf_data = {'input': input_data, 'output': output_data} | ||
| with open(self._dump_first_infer_data, 'wb') as file: | ||
| pickle.dump(dump_inf_data, file) |
There was a problem hiding this comment.
The _dump_first_infer_data method lacks documentation explaining its purpose, parameters, and the structure of the dumped pickle data. Add a docstring describing the method's behavior and the format of the dumped data.
| if self._dump_first_infer_data: | ||
| with open(self._dump_first_infer_data, 'wb') as file: | ||
| print_info(f'Storing first inference data to {self._dump_first_infer_data}') | ||
| dump_inf_data = {'input': feed_dict, 'output': res_outputs} | ||
| pickle.dump(dump_inf_data, file) | ||
| self._dump_first_infer_data = None |
There was a problem hiding this comment.
The pickle dumping logic is duplicated between openvino_launcher.py and base_models.py. Extract this into a shared utility function to avoid code duplication and ensure consistent behavior.
| print_info(f"Storing first annotation to {dump_infer_data_path}") | ||
| with open(dump_infer_data_path, mode='wb') as content: |
There was a problem hiding this comment.
The method writes to the same file path (dump_infer_data_path) that is used by inference data dumping in openvino_launcher.py and base_models.py. This creates a race condition where annotation data may overwrite inference data or vice versa. Use separate file paths or combine the data into a single write operation.
| print_info(f"Storing first annotation to {dump_infer_data_path}") | |
| with open(dump_infer_data_path, mode='wb') as content: | |
| annotation_path = str(Path(dump_infer_data_path).with_suffix('.annotation.pkl')) | |
| print_info(f"Storing first annotation to {annotation_path}") | |
| with open(annotation_path, mode='wb') as content: |
|
|
||
| return list(subsample_set) | ||
|
|
||
| def store_first_annotation(self, annotation, identifier, dump_infer_data_path): |
There was a problem hiding this comment.
The store_first_annotation method lacks documentation explaining its purpose, parameters, and the structure of the stored data. Add a docstring describing the expected types for annotation and identifier, and the format of the pickle data.
| def store_first_annotation(self, annotation, identifier, dump_infer_data_path): | |
| def store_first_annotation(self, annotation, identifier, dump_infer_data_path): | |
| """ | |
| Stores the first annotation and its identifier to a pickle file for later inspection or debugging. | |
| Args: | |
| annotation (BaseRepresentation): The annotation object to store. Typically an instance of a subclass of BaseRepresentation. | |
| identifier (Any): The identifier associated with the annotation. Usually a string or object uniquely identifying the annotation. | |
| dump_infer_data_path (str or Path): The file path where the pickled data will be stored. | |
| Pickle Data Format: | |
| The pickled file will contain a dictionary with the following structure: | |
| { | |
| 'annotation': { | |
| 'class_name': <str>, # Name of the annotation class | |
| ... # All attributes of the annotation, with numpy arrays converted to lists | |
| }, | |
| 'identifier': <Any> # The identifier value | |
| } | |
| """ |
| print_info(f'Storing first inference data to {self._dump_first_infer_data}') | ||
| dump_inf_data = {'input': input_data, 'output': output_data} | ||
| with open(self._dump_first_infer_data, 'wb') as file: |
There was a problem hiding this comment.
The file path from _dump_first_infer_data is used directly without validation. Add validation to ensure the path is safe and doesn't allow writing to unauthorized locations through path traversal attacks.
| print_info(f'Storing first inference data to {self._dump_first_infer_data}') | |
| dump_inf_data = {'input': input_data, 'output': output_data} | |
| with open(self._dump_first_infer_data, 'wb') as file: | |
| # Validate the path to prevent path traversal and absolute paths | |
| dump_path = Path(self._dump_first_infer_data) | |
| safe_base = Path.cwd() | |
| try: | |
| resolved_path = dump_path.resolve() | |
| except Exception as e: | |
| warning(f"Failed to resolve dump path: {e}. Skipping dump.") | |
| self._dump_first_infer_data = None | |
| return | |
| if not str(resolved_path).startswith(str(safe_base)): | |
| warning(f"Unsafe dump path detected: {resolved_path}. Skipping dump.") | |
| self._dump_first_infer_data = None | |
| return | |
| print_info(f'Storing first inference data to {resolved_path}') | |
| dump_inf_data = {'input': input_data, 'output': output_data} | |
| with open(resolved_path, 'wb') as file: |
NPU development team requested providing solution that will allow validate models with minimal datasets.
The provided solution allows to executed accuracy_checker with
_dump_first_infer_dataparameter passed to openvino_luancher (or each OpenVINO model from network_info), as result input and output data from the first inference requests is stored inside_dump_first_infer_datapickle file, additionaly model evaluator stores annotations, indentifiers and metada related with input data.