Skip to content

Fix CVAT text attributes coerced to int on annotation import#7252

Open
joaquinhuigomez wants to merge 4 commits intovoxel51:developfrom
joaquinhuigomez:fix/cvat-text-attribute-coercion
Open

Fix CVAT text attributes coerced to int on annotation import#7252
joaquinhuigomez wants to merge 4 commits intovoxel51:developfrom
joaquinhuigomez:fix/cvat-text-attribute-coercion

Conversation

@joaquinhuigomez
Copy link
Copy Markdown

@joaquinhuigomez joaquinhuigomez commented Mar 28, 2026

Fixes #7221

_parse_value() aggressively coerces string values to numeric types. When a CVAT text attribute contains all-digit strings like "0001", it gets cast to int(1), losing leading zeros and semantic meaning.

Added an attr_type parameter to _parse_value() so callers can indicate the CVAT attribute type. When the source type is "text", numeric coercion is skipped and the original string is preserved.

The attribute type information is now threaded from the CVAT task labels (which carry input_type on each attribute spec) through the annotation parsing pipeline via a new attr_type_map dict. All existing callers that don't provide type info continue to work as before — the parameter defaults to None, preserving backward compatibility.

Summary by CodeRabbit

  • Bug Fixes
    • Improved CVAT annotation import: attribute values are now parsed using declared input types so text stays text (empty strings become null), numbers and booleans keep correct types, and tag/classification attributes use their proper types for more faithful imports.
    • Added an option to preserve the legacy text-coercion behavior when needed.

_parse_value() aggressively coerces string values to numeric types.
When a CVAT text attribute contains all-digit strings like "0001",
it gets cast to int(1), losing leading zeros and semantic meaning.

Added an attr_type parameter to _parse_value() so callers can indicate
the CVAT attribute type. When the source type is "text", numeric
coercion is skipped and the original string is preserved.

Threaded the attribute type information from the CVAT task labels
through the annotation parsing pipeline via attr_type_map.
@joaquinhuigomez joaquinhuigomez requested a review from a team as a code owner March 28, 2026 02:12
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 28, 2026

Walkthrough

The PR propagates CVAT attribute input types through the annotation download pipeline and adds a coerce_text_attrs option. _get_attr_class_maps now returns attr_type_map, which download_annotations forwards into _parse_shapes_tags, _parse_annotation, and constructors (CVATShape, CVATTag, CVATLabel). _parse_value(value, attr_type=None) was added and preserves string values when attr_type == "text". When coerce_text_attrs is enabled, the code disables type-aware parsing by setting attr_type_map = None before parsing.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant FiftyOneDownloader as "FiftyOne\nDownloader"
    participant CVATAPI as "CVAT\nAPI"
    participant Parser as "CVAT\nParser"
    participant ShapeTag as "CVATShape / CVATTag"
    participant Dataset

    User->>FiftyOneDownloader: request download_annotations(task_id, coerce_text_attrs?)
    FiftyOneDownloader->>CVATAPI: fetch task, annotations, attribute schema
    CVATAPI-->>FiftyOneDownloader: annotations + attr_type_map
    alt coerce_text_attrs == true
        FiftyOneDownloader->>Parser: _parse_shapes_tags(annotations, attr_type_map=None)
    else
        FiftyOneDownloader->>Parser: _parse_shapes_tags(annotations, attr_type_map)
    end
    Parser->>ShapeTag: instantiate CVATShape/CVATTag with attr_type_map
    ShapeTag-->>Parser: _parse_value(value, attr_type) — preserve text when attr_type=="text"
    Parser->>Dataset: create labels with parsed attributes
    Dataset-->>User: dataset updated
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description covers linked issue, what changes are proposed, but lacks testing details and release notes checkboxes required by the template. Complete the testing section and release notes section with checkboxes marked and appropriate details filled in per the template.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and specifically describes the main fix: preventing CVAT text attributes with digits from being coerced to integers during annotation import.
Linked Issues check ✅ Passed Code changes directly address issue #7221 by threading attr_type through the parsing pipeline to preserve text attributes as strings without numeric coercion.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing CVAT text attribute handling in the REST download path; XML importer path is intentionally left for follow-up as noted in PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
fiftyone/utils/cvat.py (2)

7684-7704: ⚠️ Potential issue | 🟠 Major

Return raw values for CVAT text attributes.

This only skips numeric coercion. Text attrs still fall through to the boolean/None normalization below, so values like "false", "None", and "" are still rewritten on import. Return early when attr_type == "text" so the original CVAT string survives untouched.

🔧 Proposed fix
 def _parse_value(value, attr_type=None):
-    if attr_type != "text":
-        try:
-            return int(value)
-        except:
-            pass
-
-        try:
-            return float(value)
-        except:
-            pass
+    if attr_type == "text":
+        return value
+
+    try:
+        return int(value)
+    except:
+        pass
+
+    try:
+        return float(value)
+    except:
+        pass
 
     if etau.is_str(value):
         if value in ("True", "true"):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fiftyone/utils/cvat.py` around lines 7684 - 7704, The _parse_value function
currently skips numeric coercion for attr_type == "text" but still normalizes
booleans and None-like strings; modify _parse_value to return the original value
immediately when attr_type == "text" so text attributes are not rewritten (i.e.,
add an early return for attr_type == "text" that returns the raw value) and keep
the existing numeric and boolean/None coercion for non-text types.

6043-6073: ⚠️ Potential issue | 🟠 Major

Honor input_type in the scalar-tag branch.

Now that attr_type_map is threaded into _parse_annotation(), the scalar path should use it too. Line 6049 still calls _parse_value() without the attribute's CVAT input_type, so StringField scalars stored as text attrs will still coerce "0001" to 1; the first such value can also pin prev_type to int and cause later non-numeric strings in the same field to be dropped.

🔧 Proposed fix
                 if assigned_scalar_attrs:
                     num_attrs = len(anno["attributes"])
                     attr_ind = 0
+                    _attr_types = (
+                        attr_type_map.get(anno["label_id"], {})
+                        if attr_type_map
+                        else {}
+                    )
                     while label is None and attr_ind < num_attrs:
+                        attr = anno["attributes"][attr_ind]
                         label = _parse_value(
-                            anno["attributes"][attr_ind]["value"]
+                            attr["value"],
+                            attr_type=_attr_types.get(attr["spec_id"]),
                         )
                         attr_ind += 1
                         if label is not None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fiftyone/utils/cvat.py` around lines 6043 - 6073, The scalar branch currently
calls _parse_value(anno["attributes"][attr_ind]["value"]) without telling it the
CVAT input type, so pass the attribute's input_type from attr_type_map when
calling _parse_value (use the attribute id from anno["attributes"][attr_ind] to
look up attr_type_map and call _parse_value(value, input_type=that_input_type)
or fallback to None if missing); update the call inside the
assigned_scalar_attrs loop that references _parse_value, and keep the existing
prev_type/type-checking logic unchanged so parsing honors CVAT StringField vs
numeric input types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@fiftyone/utils/cvat.py`:
- Around line 7684-7704: The _parse_value function currently skips numeric
coercion for attr_type == "text" but still normalizes booleans and None-like
strings; modify _parse_value to return the original value immediately when
attr_type == "text" so text attributes are not rewritten (i.e., add an early
return for attr_type == "text" that returns the raw value) and keep the existing
numeric and boolean/None coercion for non-text types.
- Around line 6043-6073: The scalar branch currently calls
_parse_value(anno["attributes"][attr_ind]["value"]) without telling it the CVAT
input type, so pass the attribute's input_type from attr_type_map when calling
_parse_value (use the attribute id from anno["attributes"][attr_ind] to look up
attr_type_map and call _parse_value(value, input_type=that_input_type) or
fallback to None if missing); update the call inside the assigned_scalar_attrs
loop that references _parse_value, and keep the existing prev_type/type-checking
logic unchanged so parsing honors CVAT StringField vs numeric input types.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 13514b55-8188-405a-9f6b-f3513727fa5e

📥 Commits

Reviewing files that changed from the base of the PR and between 1f45d4c and 6b21492.

📒 Files selected for processing (1)
  • fiftyone/utils/cvat.py

Early return str(value) for text-type attributes in _parse_value to
prevent numeric coercion of text values like '0001'.

Thread attr_type through the scalar-tag branch so tag annotations
also benefit from type-aware parsing, matching the existing pattern
used in the shapes branch.
@joaquinhuigomez
Copy link
Copy Markdown
Author

Addressed both CodeRabbit findings:

  1. Early return for text type in _parse_value — changed from if attr_type != "text" guard to if attr_type == "text": return str(value) so text attributes skip boolean/None normalization entirely.

  2. Thread attr_type through scalar-tag branch — the tag annotation path at line ~6049 now looks up the attribute's input type from attr_type_map before calling _parse_value, matching the pattern already used in CVATShape.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
fiftyone/utils/cvat.py (1)

4893-4911: ⚠️ Potential issue | 🟠 Major

Mirror input_type propagation in the XML importers.

This only populates type metadata for REST downloads. CVATTaskLabels.from_labels_dict(), CVATImageAnno._parse_anno_dict(), and CVATVideoAnno._parse_anno_dict() still drop XML input_type and call _parse_value() without it, so load_cvat_image_annotations() / load_cvat_video_annotations() will still coerce "0001" to 1.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fiftyone/utils/cvat.py` around lines 4893 - 4911, The XML import path doesn't
propagate per-attribute input_type, causing XML imports to lose type metadata
and coerce string numeric values incorrectly; update
CVATTaskLabels.from_labels_dict, CVATImageAnno._parse_anno_dict, and
CVATVideoAnno._parse_anno_dict to extract attribute "input_type" from the XML
labels (mirror the REST logic used in _get_attr_class_maps to build
attr_type_map) and ensure every call that parses attribute values passes that
input_type into _parse_value (so
load_cvat_image_annotations/load_cvat_video_annotations receive and use the same
attr_type_map).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@fiftyone/utils/cvat.py`:
- Around line 6046-6060: The code currently treats an empty string value for
text scalar attributes as a valid parsed label, turning an unset scalar into ""
on round-trip; in the loop in fiftyone.utils.cvat where _attr_types is computed
and label is set via _parse_value, skip attributes whose attr_type == "text" and
whose raw attr["value"] is "" (i.e., continue to next attr instead of calling
_parse_value), so empty text scalars remain unset; refer to the
variables/functions _attr_types, attr_type_map, attr (attr["value"],
attr["spec_id"]), attr_type, and _parse_value to locate and implement this
conditional skip.

---

Outside diff comments:
In `@fiftyone/utils/cvat.py`:
- Around line 4893-4911: The XML import path doesn't propagate per-attribute
input_type, causing XML imports to lose type metadata and coerce string numeric
values incorrectly; update CVATTaskLabels.from_labels_dict,
CVATImageAnno._parse_anno_dict, and CVATVideoAnno._parse_anno_dict to extract
attribute "input_type" from the XML labels (mirror the REST logic used in
_get_attr_class_maps to build attr_type_map) and ensure every call that parses
attribute values passes that input_type into _parse_value (so
load_cvat_image_annotations/load_cvat_video_annotations receive and use the same
attr_type_map).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1e9479de-7e85-481e-8605-23ca2f52af20

📥 Commits

Reviewing files that changed from the base of the PR and between 6b21492 and 73e5eb4.

📒 Files selected for processing (1)
  • fiftyone/utils/cvat.py

Empty string values for text-type attributes should return None
(matching the existing behavior for non-text types at line 7714)
so that unset CVAT text scalars don't round-trip as "".
@joaquinhuigomez
Copy link
Copy Markdown
Author

Fixed the empty-string text scalar issue — _parse_value now returns None for "" when attr_type == "text", consistent with the existing behavior for non-text types (line 7714).

Regarding the XML importer path (load_cvat_image_annotations / load_cvat_video_annotations): that's a separate code path from the REST API download flow this PR targets. The XML format doesn't carry input_type metadata the same way the REST API does, so threading it there would require a different approach and broader changes. I'd prefer to keep this PR scoped to the REST download path that #7221 reported against, and handle the XML path in a follow-up if needed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
fiftyone/utils/cvat.py (1)

4893-4911: ⚠️ Potential issue | 🟠 Major

Keep _get_attr_class_maps() backward compatible or update the remaining callers.

Line 4911 changes this helper to return three values, but tests/intensive/cvat_tests.py still unpacks two. That path now fails with ValueError: too many values to unpack before the new parsing logic runs.

Minimal caller update outside this hunk
- attr_id_map, class_id_map = api._get_attr_class_maps(task_id)
+ attr_id_map, _, class_id_map = api._get_attr_class_maps(task_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fiftyone/utils/cvat.py` around lines 4893 - 4911, The helper
_get_attr_class_maps was changed to return three values (attr_id_map,
attr_type_map, class_map_rev) which breaks existing callers that expect two;
revert _get_attr_class_maps to return the original two-tuple (attr_id_map,
attr_type_map) to preserve backward compatibility and move the class_map_rev
construction into a new helper (e.g. _build_class_map_rev or _get_class_map_rev)
that returns class_map_rev when callers need it; update any new code paths to
call the new helper explicitly and leave existing callers (e.g.
tests/intensive/cvat_tests.py) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@fiftyone/utils/cvat.py`:
- Around line 4722-4724: XML import paths call _parse_value without type
metadata causing digit-only text attributes to be coerced; in
load_cvat_image_annotations and load_cvat_video_annotations, fetch the attribute
maps via self._get_attr_class_maps(task_id) (to obtain attr_type_map) and pass
that attr_type_map into calls to _parse_value (instead of calling
_parse_value(value) with no type info). Update the places that currently call
_parse_value without type metadata to use _parse_value(value, attr_type_map,
attr_id_map) (or the existing _parse_value signature that accepts type maps) so
parsing is type-aware; also apply the same change to the other XML import call
sites that use _parse_value without attr_type_map (references: functions
_parse_value, load_cvat_image_annotations, load_cvat_video_annotations, and
_get_attr_class_maps).

---

Outside diff comments:
In `@fiftyone/utils/cvat.py`:
- Around line 4893-4911: The helper _get_attr_class_maps was changed to return
three values (attr_id_map, attr_type_map, class_map_rev) which breaks existing
callers that expect two; revert _get_attr_class_maps to return the original
two-tuple (attr_id_map, attr_type_map) to preserve backward compatibility and
move the class_map_rev construction into a new helper (e.g. _build_class_map_rev
or _get_class_map_rev) that returns class_map_rev when callers need it; update
any new code paths to call the new helper explicitly and leave existing callers
(e.g. tests/intensive/cvat_tests.py) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c984dbc9-fc22-47fc-a806-7d396da77607

📥 Commits

Reviewing files that changed from the base of the PR and between 73e5eb4 and 9ff28ab.

📒 Files selected for processing (1)
  • fiftyone/utils/cvat.py

Comment on lines +4722 to 4724
attr_id_map, attr_type_map, _class_map_rev = (
self._get_attr_class_maps(task_id)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Apply the same type-aware parsing to XML imports.

This fix only works for the REST download path, where attr_type is available from task metadata. Line 1702 and Line 2436 still call _parse_value() without type metadata, so load_cvat_image_annotations() and load_cvat_video_annotations() will still coerce digit-only text attrs from XML imports.

Also applies to: 7693-7695

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fiftyone/utils/cvat.py` around lines 4722 - 4724, XML import paths call
_parse_value without type metadata causing digit-only text attributes to be
coerced; in load_cvat_image_annotations and load_cvat_video_annotations, fetch
the attribute maps via self._get_attr_class_maps(task_id) (to obtain
attr_type_map) and pass that attr_type_map into calls to _parse_value (instead
of calling _parse_value(value) with no type info). Update the places that
currently call _parse_value without type metadata to use _parse_value(value,
attr_type_map, attr_id_map) (or the existing _parse_value signature that accepts
type maps) so parsing is type-aware; also apply the same change to the other XML
import call sites that use _parse_value without attr_type_map (references:
functions _parse_value, load_cvat_image_annotations,
load_cvat_video_annotations, and _get_attr_class_maps).

Copy link
Copy Markdown
Member

@Burhan-Q Burhan-Q left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work identifying and fixing this issue. The root cause analysis is spot on, and the fix in _parse_value() itself is reasonable.

I had one design concern I wanted to raise: the attr_type_map is currently threaded as a parameter through

download_annotations_parse_shapes_tags_parse_annotationCVATShape/CVATTagCVATLabel.__init___parse_value

which adds up to about 20 places it's needed in the code, spanning 8 functions. Most of these are pure pass-thru _parse_shapes_tags and _parse_annotation don't use the map directly, they just relay it.

Since _parse_shapes_tags, _parse_annotation, and download_annotations are all methods on CVATAnnotationAPI, I think it might be simpler to store the map as instance state (self._attr_type_map) on the API object when it's built per-task in download_annotations. That way, _parse_annotation can read self._attr_type_map directly and only pass it to the CVATShape, CVATTag, and CVATLabel constructors that actually need it. This would eliminate the parameter from the signatures of _parse_shapes_tags and _parse_annotation entirely, cutting down the number of areas impacted by the proposed changes.

I also had a concern about backward compatibility. There may be users who have pipelines where text attributes containing text values like "42" or "true" are intentionally coerced to int or bool values by the current behavior (see Hyrum's Law). Making text preservation automatic would silently change their results.
One way I can think to handle this would be to gate the behavior behind an opt-in config parameter on CVATBackendConfig, something like preserve_text_values=False (see note below). When enabled, download_annotations would populate self._attr_type_map from the CVAT task labels; when disabled (the default), self._attr_type_map stays None and existing coercion behavior is preserved. The Config base class handles serialization automatically for public attributes, so it would round-trip through annotate() and load_annotations() without any additional plumbing. This would need to be updated in the documentation as well, but would permit users to opt in with:

dataset.annotate(anno_key, ..., preserve_text_values=True)  # see preferred argument name below

One final note regarding the changes on this PR. Including unit tests to cover the new, and old, behaviors will be immensely helpful. Adding or updating tests will ensure that changes, if merged, will have safeguards to ensure the new behavior going forward.

To summarize, the two suggestions are:

  1. Store attr_type_map on CVATAnnotationAPI to cut the intermediate pass-through plumbing
  2. Default to existing behavior, let users enable text preservation explicitly, likely needs documentation updates for CVAT
  3. Include unit tests to cover all proposed changes, (for either the current proposed changes or based on my recommendations)

Happy to discuss if I'm missing something about why the current approach is preferable. I appreciate the thorough work tracing the type info through the pipeline. Ultimately merging the changes is not up to me, but I'm providing feedback with what I think could give the best odds of the changes being accepted.


[edit]

Important

Using opt-in arg name coerce_text_attrs is preferred over the suggested preserve_text_values. The design eeps the same behavior, coerce_text_attrs=False as the default which uses the same behavior as today (no coercing), coerce_text_attrs=True to enforce coercing CVAT #text attributes as string values.

@brimoor
Copy link
Copy Markdown
Member

brimoor commented Apr 1, 2026

@Burhan-Q FYI I added some thoughts to the motivating issue here.

TLDR, I think this PR makes sense. On nomenclature, I have a slight preference for coerce_text_attrs=False, but preserve_text_attrs=True is okay too if you prefer that.

On these points:

  1. Store attr_type_map on CVATAnnotationAPI to cut the intermediate pass-through plumbing
  2. Default to existing behavior, let users enable text preservation explicitly, likely needs documentation updates for CVAT
  3. Include unit tests to cover all proposed changes, (for either the current proposed changes or based on my recommendations)

my personal opinion is:

  1. ❌ Not necessary to merge
  2. ✅ Definitely necessary!
  3. ❌ Not necessary to merge, but if tests are added, the place to do that is here

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.90%. Comparing base (c35f162) to head (9ff28ab).
⚠️ Report is 2285 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7252      +/-   ##
===========================================
- Coverage    99.10%   98.90%   -0.21%     
===========================================
  Files           85      102      +17     
  Lines        23353    29472    +6119     
===========================================
+ Hits         23143    29148    +6005     
- Misses         210      324     +114     
Flag Coverage Δ
python 98.90% <ø> (-0.21%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Burhan-Q
Copy link
Copy Markdown
Member

Burhan-Q commented Apr 1, 2026

I agree with argument name coerce_text_attrs=False for the default behavior, with coerce_text_attrs=True as the new opt-in behavior. I've appended my original comment to reflect my agreement.

On the point of adding attr_type_map to CVATAnnotationAPI, it was a recommendation to simplify the changes. Sounds like that's not necessary, but I do still think it would be a nice addition personally, but I won't block on it.

@joaquinhuigomez please make the appropriate updates when you're able to. If you need some time, that's understandable. Please leave a comment or react with 👍 on this comment to acknowledge you're able to make updates soon, otherwise I'll assume you're busy if there's no response after a few days. You could also ping me on the #github-contribution channel on the Discord if that's easier https://discord.gg/fiftyone-community

@joaquinhuigomez
Copy link
Copy Markdown
Author

On it — will update with coerce_text_attrs=False as default.

@joaquinhuigomez
Copy link
Copy Markdown
Author

Added coerce_text_attrs=False to download_annotations. When False (default), text attributes are preserved as strings. Set to True for the old coercion behavior.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@fiftyone/utils/cvat.py`:
- Around line 3361-3367: The public load path doesn't expose the
coerce_text_attrs flag: update fiftyone.utils.annotations.load_annotations() to
accept a coerce_text_attrs parameter (default False) and thread it through when
calling results.backend.download_annotations(results,
coerce_text_attrs=coerce_text_attrs); also update the abstract method signature
AnnotationBackend.download_annotations(self, results, coerce_text_attrs=False)
and concrete implementations (e.g., CVATBackend.download_annotations /
fiftyone/utils/cvat.py::download_annotations) to accept and pass the flag;
alternatively persist the choice on the results/run config and read it inside
results.backend.download_annotations if you prefer configuration-based
propagation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e33d4e3c-78a2-47e2-ae7a-6aeda08ed0fc

📥 Commits

Reviewing files that changed from the base of the PR and between 9ff28ab and 9f2259f.

📒 Files selected for processing (1)
  • fiftyone/utils/cvat.py

Comment on lines +3361 to +3367
def download_annotations(self, results, coerce_text_attrs=False):
api = self.connect_to_api()

logger.info("Downloading labels from CVAT...")
annotations = api.download_annotations(results)
annotations = api.download_annotations(
results, coerce_text_attrs=coerce_text_attrs
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Expose coerce_text_attrs on the public load path.

fiftyone.utils.annotations.load_annotations() still calls results.backend.download_annotations(results) with no way to pass this flag, and the abstract AnnotationBackend.download_annotations() signature still omits it. As written, the legacy opt-in is unreachable from the standard API even though the backend/API now expose it. Please thread the option through the public helper or persist it on the run config/results.

Also applies to: 4649-4657

🧰 Tools
🪛 Ruff (0.15.7)

[warning] 3361-3361: Boolean default positional argument in function definition

(FBT002)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fiftyone/utils/cvat.py` around lines 3361 - 3367, The public load path
doesn't expose the coerce_text_attrs flag: update
fiftyone.utils.annotations.load_annotations() to accept a coerce_text_attrs
parameter (default False) and thread it through when calling
results.backend.download_annotations(results,
coerce_text_attrs=coerce_text_attrs); also update the abstract method signature
AnnotationBackend.download_annotations(self, results, coerce_text_attrs=False)
and concrete implementations (e.g., CVATBackend.download_annotations /
fiftyone/utils/cvat.py::download_annotations) to accept and pass the flag;
alternatively persist the choice on the results/run config and read it inside
results.backend.download_annotations if you prefer configuration-based
propagation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] load_annotations from CVAT "text" attribute treated as "int" when using only digits

3 participants