Draft
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the CTP pixel de-identification pipeline to address failures with compressed DICOM transfer syntaxes by replacing the decompressor stage with a transcoder stage placed before pixel anonymization, and adds tests intended to exercise multiple transfer syntaxes end-to-end.
Changes:
- Swap
DicomDecompressorforDicomTranscoder(EVRLE target) beforeDicomPixelAnonymizerin pixel pipelines. - Add helpers to generate DICOM inputs across several transfer syntaxes.
- Add new integration-style tests for pixel and non-pixel de-id across transfer syntaxes.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
ctp.py |
Updates pixel pipeline templates to transcode to uncompressed syntax before pixel anonymization. |
test_ctp.py |
Adds DICOM generation utilities and new tests targeting multiple transfer syntaxes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When running large sets of files for de-identification the task would fail, and many files would give EOF errors in the ctp.txt log file.
It was determined that the EOF errors were being caused by the DICOM Decompressor stage in the CTP Pipeline being run on JPEG2000 Compressed Dicom files. The decompressor was not handling the YBR_RCT color space correctly, forcing a conversion to RGB colorspace which improperly set the expected pixel length of the file. During the DICOMAnonymizer pipeline stage the files would give an EOF error and be written to the quarantine. The files were not properly closed leading to memory leaks as more files were corrupted. On large sets of these files, this process would eventually crash the entire java process.
The DICOM Decompressor stage has been removed in favor of a DICOM Transcoder that can handle the compressed files without issue. A test has been added to ensure that the compressed files are processed without being quarantined due to the the EOF error in reading the files