Skip to content

Fix decompressor issue in ctp pipeline#70

Draft
qsodia wants to merge 5 commits intomainfrom
qs/solve-bulk-deid-stalling
Draft

Fix decompressor issue in ctp pipeline#70
qsodia wants to merge 5 commits intomainfrom
qs/solve-bulk-deid-stalling

Conversation

@qsodia
Copy link
Collaborator

@qsodia qsodia commented Mar 13, 2026

When running large sets of files for de-identification the task would fail, and many files would give EOF errors in the ctp.txt log file.
It was determined that the EOF errors were being caused by the DICOM Decompressor stage in the CTP Pipeline being run on JPEG2000 Compressed Dicom files. The decompressor was not handling the YBR_RCT color space correctly, forcing a conversion to RGB colorspace which improperly set the expected pixel length of the file. During the DICOMAnonymizer pipeline stage the files would give an EOF error and be written to the quarantine. The files were not properly closed leading to memory leaks as more files were corrupted. On large sets of these files, this process would eventually crash the entire java process.
The DICOM Decompressor stage has been removed in favor of a DICOM Transcoder that can handle the compressed files without issue. A test has been added to ensure that the compressed files are processed without being quarantined due to the the EOF error in reading the files

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CTP pixel de-identification pipeline to address failures with compressed DICOM transfer syntaxes by replacing the decompressor stage with a transcoder stage placed before pixel anonymization, and adds tests intended to exercise multiple transfer syntaxes end-to-end.

Changes:

  • Swap DicomDecompressor for DicomTranscoder (EVRLE target) before DicomPixelAnonymizer in pixel pipelines.
  • Add helpers to generate DICOM inputs across several transfer syntaxes.
  • Add new integration-style tests for pixel and non-pixel de-id across transfer syntaxes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
ctp.py Updates pixel pipeline templates to transcode to uncompressed syntax before pixel anonymization.
test_ctp.py Adds DICOM generation utilities and new tests targeting multiple transfer syntaxes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@qsodia qsodia requested a review from ReeceStevens March 16, 2026 16:30
@qsodia qsodia requested a review from ReeceStevens March 16, 2026 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants