Adding option to preserve all-digit text in CVAT as string#7258
Adding option to preserve all-digit text in CVAT as string#7258andresjguevara wants to merge 1 commit intovoxel51:developfrom
Conversation
WalkthroughThis pull request adds a Sequence DiagramsequenceDiagram
actor User
participant Loader as CVAT Loader
participant AttrDetector as Attribute<br/>Detector
participant Parser as Value Parser
participant Dataset as FiftyOne<br/>Dataset
User->>Loader: load_cvat_image_annotations(xml, preserve_cvat_text_strings=True)
Loader->>AttrDetector: Identify text-typed attributes per label
AttrDetector-->>Loader: text_attribute_names_by_label map
loop For each annotation
Loader->>Parser: Parse attribute value<br/>(input_type, value, preserve flag)
alt preserve_cvat_text_strings=True
Parser-->>Loader: Return raw string
else preserve_cvat_text_strings=False
Parser-->>Loader: Coerce to numeric/bool
end
end
Loader->>Dataset: Populate with typed attributes
Dataset-->>User: Annotations loaded
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
FYI, there is another PR for this issue #7252 |
|
Burhan-Q Thank you! I just realized after looking at the status of the issue. Let me know if I should just close this PR. Thanks for you attention |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
fiftyone/utils/cvat.py (2)
7689-7755:⚠️ Potential issue | 🟠 MajorThread this flag through the CVAT dataset importers as well.
These helpers now support the opt-in, but
CVATImageDatasetImporter.setup()andCVATVideoDatasetImporter.__next__()still call them with defaults. As written,Dataset.from_dir(..., fo.types.CVATImageDataset/fo.types.CVATVideoDataset)cannot preserve CVATtextattrs and still coerces values like"0020"on the high-level XML import path.Also applies to: 7758-7835
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@fiftyone/utils/cvat.py` around lines 7689 - 7755, The importers call the helper loaders with defaults so the preserve_cvat_text_strings opt-in is ignored; update CVATImageDatasetImporter.setup() and CVATVideoDatasetImporter.__next__() to pass the importer-level flag through into load_cvat_image_annotations (and the analogous load_cvat_video_annotations) by threading a preserve_cvat_text_strings attribute (e.g. self.preserve_cvat_text_strings or a parameter on setup/constructor) and using it when calling CVATImage.from_image_dict / the video loader; ensure both call sites referenced (CVATImageDatasetImporter.setup, CVATVideoDatasetImporter.__next__) forward the flag instead of relying on the loader default so CVAT text attributes are preserved when requested.
7376-7388:⚠️ Potential issue | 🟠 MajorPreserved
textattrs are still re-typed in theattribute:branch.This now preserves raw strings at parse time, but Line 7387 immediately sends
attribute:values back throughCVATAttribute.to_attribute(), which re-inferrs the type from the content. That means API-loaded attrs stored inlabel.attributescan still lose"0020"and diverge from the XML path.💡 Suggested fix
for attr in attrs: name = attr_id_map_rev[attr["spec_id"]] input_type = _get_cvat_attr_input_type( attr_input_type_map, cvat_id, attr["spec_id"] ) value = _parse_cvat_attribute_value( attr["value"], input_type=input_type, preserve_cvat_text_strings=preserve_cvat_text_strings, ) if value is not None: if name.startswith("attribute:"): name = name[len("attribute:") :] - fo_attr = CVATAttribute(name, value).to_attribute() + if ( + preserve_cvat_text_strings + and _is_cvat_text_input_type(input_type) + ): + fo_attr = fol.CategoricalAttribute(value=value) + else: + fo_attr = CVATAttribute(name, value).to_attribute() self.fo_attributes[name] = fo_attr else: self.attributes[name] = value🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@fiftyone/utils/cvat.py` around lines 7376 - 7388, The code preserves raw text in _parse_cvat_attribute_value but then re-infers types by calling CVATAttribute.to_attribute() in the "attribute:" branch; change the branch in the loop that handles names starting with "attribute:" so that when preserve_cvat_text_strings is True you do not call CVATAttribute.to_attribute(), instead store the parsed raw string value directly (or construct a FO attribute that explicitly marks it as a preserved string) into self.fo_attributes[name]; keep the existing CVATAttribute.to_attribute() path only when preserve_cvat_text_strings is False so API-loaded attrs do not lose values like "0020".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unittests/utils_tests.py`:
- Around line 1188-1196: The XML fixture in tests/unittests/utils_tests.py omits
the input_type for the attributes named "bib" and "score", making the test rely
on fallback semantics; update the fixture to include explicit input_type
attributes for <attribute><name>bib</name> and <attribute><name>score</name> so
they match the CVAT input_type expected by the test assertions (e.g., set
input_type="text" or the specific type the assertions assume), ensuring the test
validates declared input types rather than implicit defaults.
---
Outside diff comments:
In `@fiftyone/utils/cvat.py`:
- Around line 7689-7755: The importers call the helper loaders with defaults so
the preserve_cvat_text_strings opt-in is ignored; update
CVATImageDatasetImporter.setup() and CVATVideoDatasetImporter.__next__() to pass
the importer-level flag through into load_cvat_image_annotations (and the
analogous load_cvat_video_annotations) by threading a preserve_cvat_text_strings
attribute (e.g. self.preserve_cvat_text_strings or a parameter on
setup/constructor) and using it when calling CVATImage.from_image_dict / the
video loader; ensure both call sites referenced (CVATImageDatasetImporter.setup,
CVATVideoDatasetImporter.__next__) forward the flag instead of relying on the
loader default so CVAT text attributes are preserved when requested.
- Around line 7376-7388: The code preserves raw text in
_parse_cvat_attribute_value but then re-infers types by calling
CVATAttribute.to_attribute() in the "attribute:" branch; change the branch in
the loop that handles names starting with "attribute:" so that when
preserve_cvat_text_strings is True you do not call CVATAttribute.to_attribute(),
instead store the parsed raw string value directly (or construct a FO attribute
that explicitly marks it as a preserved string) into self.fo_attributes[name];
keep the existing CVATAttribute.to_attribute() path only when
preserve_cvat_text_strings is False so API-loaded attrs do not lose values like
"0020".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2023cdc0-3213-465f-9de7-887455b30996
📒 Files selected for processing (3)
docs/source/integrations/cvat.rstfiftyone/utils/cvat.pytests/unittests/utils_tests.py
| <attribute> | ||
| <name>bib</name> | ||
| <values></values> | ||
| </attribute> | ||
| <attribute> | ||
| <name>score</name> | ||
| <values>low | ||
| high</values> | ||
| </attribute> |
There was a problem hiding this comment.
Align the XML fixture with the flag contract (explicit input_type).
Line 1188-Line 1196 omits input_type for bib/score, but the assertions validate behavior that is supposed to be gated by declared CVAT input type. This makes the test ambiguous and can drift from the fallback semantics for missing metadata.
Proposed fix
<attribute>
<name>bib</name>
+ <input_type>text</input_type>
<values></values>
</attribute>
<attribute>
<name>score</name>
+ <input_type>select</input_type>
<values>low
high</values>
</attribute>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <attribute> | |
| <name>bib</name> | |
| <values></values> | |
| </attribute> | |
| <attribute> | |
| <name>score</name> | |
| <values>low | |
| high</values> | |
| </attribute> | |
| <attribute> | |
| <name>bib</name> | |
| <input_type>text</input_type> | |
| <values></values> | |
| </attribute> | |
| <attribute> | |
| <name>score</name> | |
| <input_type>select</input_type> | |
| <values>low | |
| high</values> | |
| </attribute> |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unittests/utils_tests.py` around lines 1188 - 1196, The XML fixture in
tests/unittests/utils_tests.py omits the input_type for the attributes named
"bib" and "score", making the test rely on fallback semantics; update the
fixture to include explicit input_type attributes for
<attribute><name>bib</name> and <attribute><name>score</name> so they match the
CVAT input_type expected by the test assertions (e.g., set input_type="text" or
the specific type the assertions assume), ensuring the test validates declared
input types rather than implicit defaults.
|
No worries @andresjguevara and maybe you can set your PR to draft for now. If the other PR doesn't get updated or is stalled, we can investigate alternatives. I will say the other proposed change is much smaller, and even with that one I suggested trimming it more (see this comment for details if interested). Appreciate you opening the PR and there are other issues you're very welcome to contribute towards! |
|
@Burhan-Q Thank you! Will set as Draft and review the other PR and your suggestion to get myself familiarized. Thanks |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7258 +/- ##
===========================================
- Coverage 99.10% 98.90% -0.20%
===========================================
Files 85 102 +17
Lines 23353 29510 +6157
===========================================
+ Hits 23143 29186 +6043
- Misses 210 324 +114
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
🔗 Related Issues
#7221
📋 What changes are proposed in this pull request?
This PR adds a CVAT-specific, backward-compatible opt-in for annotation loading:
dataset.load_annotations(anno_key, preserve_cvat_text_strings=True)When enabled, CVAT attributes declared as
input_type == "text"are preserved as raw strings instead of being schema-blind coerced (e.g."0020"stays"0020"). Default behavior is unchanged. Non-text CVAT attributes keep current parsing behavior, and when schema metadata is missing, parsing falls back to existing behavior.The change is scoped to CVAT ingestion paths (including XML consistency), with updated docstrings/signatures and CVAT docs.
🧪 How is this patch tested? If it is not, please explain why.
Added/updated tests cover:
"100" -> 100,"1.5" -> 1.5,"False" -> False,"0020" -> 20)textattributes remain exact strings ("0020","100","False")preserve_cvat_text_strings=Truereaches CVAT parsing flowManual validation also confirmed representative before/after parsing outcomes for CVAT text attributes.
📝 Release Notes
Is this a user-facing change that should be mentioned in the release notes?
Adds an opt-in CVAT annotation import flag,
preserve_cvat_text_strings, to preserve raw string values for CVATtextattributes duringdataset.load_annotations(). This prevents string-fidelity loss (for example, leading zeros) while keeping default behavior unchanged.What areas of FiftyOne does this PR affect?
fiftyonePython library changesSummary by CodeRabbit
New Features
preserve_cvat_text_stringsparameter for CVAT annotation loading, enabling retention of original string values in text attributes without automatic type conversion.Documentation
preserve_cvat_text_stringsoption and usage examples for CVAT tasks.