Skip to content

Conversation

@vaibhav45sktech
Copy link

@vaibhav45sktech vaibhav45sktech commented Jan 4, 2026

Description-

This PR updates the CLI help text for --title, --abstract, and --description to align with the DBpedia databus data model hierarchy.

Currently, the CLI uses generic "Dataset" terminology, which causes confusion between Artifacts (the logical container) and Versions (the specific release). Users often mistake --title for a version-specific name, or --abstract for a global description.

These changes clarify the scope of each argument without breaking existing CLI compatibility.

Changes

I have updated the help strings to explicitly map the arguments to their databus concepts:

Argument Old Help Text New Help Text
--title Dataset title Artifact Label: the permanent name of the data series (applies to all versions)
--abstract Dataset abstract max 200 chars Version Abstract: a short summary (max 200 chars) specific to this timestamped release
--description Dataset description Version Description: detailed metadata for this specific release (supports Markdown)

Solves Issue #21

Summary by CodeRabbit

  • Documentation

    • Help text for deploy command metadata options updated with clearer, version-aware descriptions.
  • Refactor

    • Deploy API and CLI updated to use clearer artifact metadata parameter names and explicit argument mapping during deployment, maintaining existing deploy flows and behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 4, 2026

📝 Walkthrough

Walkthrough

CLI help texts were clarified for deploy options; API parameter names in deploy functions were renamed from title/abstract/description to artifact_title/artifact_abstract/artifact_description. Calls to create_dataset were updated to use explicit keyword arguments. No other functional changes were introduced.

Changes

Cohort / File(s) Summary
CLI: deploy help text & call-site
databusclient/cli.py
Updated help strings for --title, --abstract, --description to indicate artifact vs version semantics; updated call to api_deploy.create_dataset to use keyword arguments (version_id=..., artifact_title=..., artifact_abstract=..., artifact_description=..., license_url=..., distributions=...).
API: deploy parameter rename & propagation
databusclient/api/deploy.py
Renamed public parameters title, abstract, descriptionartifact_title, artifact_abstract, artifact_description in create_dataset, deploy_from_metadata, and related internal artifact/dataset graph construction and docstrings; updated internal calls and fields to use the new names.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the primary change—clarifying CLI help text to align with the databus hierarchy—which is the main focus of the PR.
Description check ✅ Passed The description is comprehensive and well-structured, explaining the problem, the specific changes made with a clear table, and linking to the related issue.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
databusclient/cli.py (2)

14-56: Consider several improvements to the parsing logic.

  1. Warning mechanism (Line 49): Using print() for warnings writes directly to stdout, which is not ideal for library code. Consider using Python's logging module or collecting warnings in the return value for better testability and control.

  2. Compression heuristic is fragile (Lines 36-39): The classification of .tar as compression is questionable since .tar is often treated as an archive format, not a compression format. Files like .tar.gz have both archiving and compression. This heuristic may misclassify extensions and lead to incorrect metadata.

  3. Missing docstring examples: The docstring should include concrete examples showing input and expected output to clarify the expected format and behavior.

  4. No validation of key=value pairs (Line 43): The code splits on '=' but doesn't validate that the key or value is non-empty. Consider adding validation to reject malformed variants like "=value" or "key=".

  5. Inconsistent extension handling: Line 37's comment suggests conditional removal of the leading dot ("if needed"), but the code always strips it. Clarify the expected format or document why the leading dot is always removed.

🔎 Suggested improvements

Replace print() with proper logging:

+import logging
+
+logger = logging.getLogger(__name__)
+
 def parse_distribution_str(dist_str: str):
     """
     Parses a distribution string with format:
     URL|key=value|...|.extension
     
     Returns a dictionary suitable for the deploy API.
+    
+    Examples:
+        >>> parse_distribution_str("http://example.com/data.json|lang=en|.json|.gz")
+        {'url': 'http://example.com/data.json', 'variants': {'lang': 'en'}, 
+         'formatExtension': 'json', 'compression': 'gz'}
     """
     ...
         else:
-             print(f"WARNING: Unrecognized modifier '{part}' in distribution. Expected '.ext' or 'key=val'.")
+             logger.warning(f"Unrecognized modifier '{part}' in distribution. Expected '.ext' or 'key=val'.")

Add validation for key=value pairs:

         elif '=' in part:
             key, value = part.split('=', 1)
+            if not key.strip() or not value.strip():
+                logger.warning(f"Invalid key=value pair '{part}' in distribution.")
+                continue
             variants[key.strip()] = value.strip()

129-137: Remove developer scaffolding comments and consider error handling.

  1. Lines 129, 137: The "--- CHANGE START/END ---" comments are developer scaffolding and should be removed before merging.

  2. Line 131: Consider adding error handling around parse_distribution_str calls. If parsing fails (e.g., malformed distribution string), the user should receive a clear error message rather than an uncaught exception.

  3. Behavioral change: This modification changes the contract between the CLI and api_deploy.create_dataset. Previously, raw strings were passed; now parsed dictionaries are passed. While this appears to be intentional and supported by changes in deploy.py, it's worth verifying that all downstream consumers can handle this change.

🔎 Proposed cleanup
-        # --- CHANGE START ---
-        # Parse the input strings into structured objects
         parsed_distributions = [parse_distribution_str(d) for d in distributions]
         
-        # Note: api_deploy.create_dataset now accepts this list of dicts
         dataid = api_deploy.create_dataset(
             version_id, title, abstract, description, license_url, parsed_distributions
         )
-        # --- CHANGE END ---
databusclient/api/queries.py (1)

54-93: Consider adding validation for malformed key=value pairs.

Similar to parse_distribution_str in cli.py, this function splits on '=' (Line 87) but doesn't validate that both key and value are non-empty. Consider adding a check to handle edge cases like "=value" or "key=".

🔎 Suggested validation
         if "=" in part:
             key, value = part.split("=", 1)
+            if not key.strip() or not value.strip():
+                # Skip malformed key=value pairs
+                continue
             variants[key.strip()] = value.strip()
tests/test_parse_distribution.py (1)

1-275: Excellent test coverage with minor gaps.

The test suite is comprehensive and well-structured:

  • Clear test organization with descriptive test names
  • Good coverage of parsing logic: URL extraction, variants, extensions, compression
  • Appropriate use of mocking to avoid network dependencies
  • Integration tests validate end-to-end behavior with the deploy API

Minor suggestions for additional test coverage:

  1. Error handling: Consider adding tests for truly malformed inputs that might cause exceptions (e.g., None input, non-string input, extremely long strings).

  2. Edge case - URLs with delimiter: Test behavior when the URL itself contains the '|' delimiter character (e.g., "http://example.com/file?param=a|b|.json"). The current implementation would split incorrectly.

  3. Empty parts: Test distribution strings with empty parts between delimiters (e.g., "http://example.com/file||.json" with double pipe).

🔎 Example additional tests
def test_url_with_pipe_character(self):
    """Test URL containing pipe character in query string."""
    # This is an edge case that might need special handling
    result = parse_distribution_str("http://example.com/file?param=a|b")
    # Current implementation would incorrectly split on the pipe in the URL
    assert result["url"] == "http://example.com/file?param=a|b"

def test_empty_parts_between_delimiters(self):
    """Test handling of empty parts between delimiters."""
    result = parse_distribution_str("http://example.com/file||.json")
    # Should gracefully handle empty parts
    assert result["formatExtension"] == "json"
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2c3f1c and c6316e2.

📒 Files selected for processing (5)
  • databusclient/api/deploy.py
  • databusclient/api/queries.py
  • databusclient/cli.py
  • tests/test_deploy.py
  • tests/test_parse_distribution.py
🧰 Additional context used
🧬 Code graph analysis (2)
databusclient/cli.py (1)
databusclient/api/deploy.py (1)
  • create_dataset (304-465)
tests/test_parse_distribution.py (2)
databusclient/cli.py (1)
  • parse_distribution_str (14-56)
databusclient/api/deploy.py (1)
  • _get_file_info_from_dict (176-208)
🔇 Additional comments (4)
databusclient/cli.py (1)

73-75: LGTM! Help text improvements align with PR objectives.

The updated help text successfully clarifies the distinction between Artifact-level metadata (permanent name) and Version-level metadata (release-specific). The terminology now explicitly maps to the DBpedia Databus data model.

databusclient/api/deploy.py (1)

304-387: LGTM! Backwards-compatible API extension.

The extension of distributions parameter to accept Union[List[str], List[Dict]] is well-implemented:

  • Backwards compatible with existing string-based distributions
  • Clear documentation in the docstring (lines 334-337)
  • Appropriate use of isinstance() check to branch on format (line 368)
  • Both branches produce the same tuple structure for consistent downstream processing

The approach correctly maintains the existing validation (lines 389-393) for both formats.

tests/test_deploy.py (1)

15-16: LGTM! Good test hygiene.

Extracting the example URL into a named constant improves test maintainability and readability.

databusclient/api/queries.py (1)

1-51: This module introduces unused code unrelated to the PR's stated objective.

ONTOLOGIES_QUERY and parse_content_variants_string() are defined in this new module but are not imported or used anywhere in the codebase. Since the PR aims to clarify CLI help text, this feature addition appears out of scope.

Either:

  1. Move this to a separate PR with proper documentation of the feature's purpose and integration points, or
  2. Integrate it into the codebase (add imports where needed and update the PR description to reflect this feature work)

Additionally, document why these functions are being added if they're intended for future use.

@vaibhav45sktech vaibhav45sktech force-pushed the docs/clarify-cli-terminology branch from c6316e2 to aca668e Compare January 4, 2026 08:05
@vaibhav45sktech
Copy link
Author

vaibhav45sktech commented Jan 6, 2026

Greetings @Integer-Ctrl @m1ci could you please review this pr ?

@vaibhav45sktech
Copy link
Author

Greetings @Integer-Ctrl @m1ci could you please review this pr ?

Kindly review it.

@Integer-Ctrl Integer-Ctrl self-requested a review January 10, 2026 11:00
Copy link
Contributor

@Integer-Ctrl Integer-Ctrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said, it is a good idea to keep the existing CLI arguments for now to avoid breaking changes. However, perhaps you could change the variable names in the method create_dataset in the deploy.py file:

  • title -> artifact_title
  • abstract -> artifact_abstract
  • description -> artifact_description

This would improve clarity.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
databusclient/api/deploy.py (4)

287-296: Update docstring to match the new parameter names.

The docstring still documents the old parameter names (title, abstract, description) instead of the new names (artifact_title, artifact_abstract, artifact_description).

📝 Proposed fix for the docstring
     version_id: str
         The version ID representing the Dataset. Needs to be in the form of $DATABUS_BASE/$ACCOUNT/$GROUP/$ARTIFACT/$VERSION
-    title: str
-        The title text of the dataset
-    abstract: str
-        A short (one or two sentences) description of the dataset
-    description: str
-        A long description of the dataset. Markdown syntax is supported
+    artifact_title: str
+        The artifact title (permanent name of the data series, applies to all versions)
+    artifact_abstract: str
+        The artifact abstract (a short description of the artifact)
+    artifact_description: str
+        The artifact description (detailed metadata for the artifact). Markdown syntax is supported
     license_url: str
         The license of the dataset as a URI.

501-507: Update the call to create_dataset with the new parameter names.

The deploy_from_metadata function still calls create_dataset using the old parameter names (title, abstract, description), which will cause a TypeError since those parameters no longer exist.

🔧 Proposed fix
     dataset = create_dataset(
         version_id=version_id,
-        title=title,
-        abstract=abstract,
-        description=description,
+        artifact_title=title,
+        artifact_abstract=abstract,
+        artifact_description=description,
         license_url=license_url,
         distributions=distributions,
     )

269-281: Inconsistency: PR description doesn't match the actual changes.

The PR description states: "The changes preserve CLI compatibility and only modify help strings." However, this file changes the public API function signature of create_dataset by renaming parameters, which is a breaking change for any code that calls this function with keyword arguments.

If the intent is truly to only update CLI help text without breaking the API, consider keeping the original parameter names in the function signature and only updating the CLI layer where argument parsing happens.


269-281: Critical: Incomplete parameter rename causes NameError and TypeError at runtime.

The function signature renames title, abstract, and description to artifact_title, artifact_abstract, and artifact_description, but the refactoring is incomplete:

  1. Function body references undefined variables (lines 383-387, 393-405): The code still uses the old parameter names, which will raise NameError at runtime.

  2. All callers use old parameter names:

    • deploy_from_metadata() (line 501)
    • databusclient/cli.py:84
    • tests/test_deploy.py:94

    These will fail with TypeError (unexpected keyword arguments).

  3. Docstring is outdated (lines 282-315): Still documents the old parameter names, creating API documentation mismatch.

Additionally, this is a breaking API change that contradicts the PR description's claim of "preserve CLI compatibility."

Required fixes

Update lines 383-387 and 393-405 to use the renamed parameters:

     artifact_graph = {
         "@id": artifact_id,
         "@type": "Artifact",
-        "title": title,
-        "abstract": abstract,
-        "description": description,
+        "title": artifact_title,
+        "abstract": artifact_abstract,
+        "description": artifact_description,
     }
     dataset_graph = {
         "@type": ["Version", "Dataset"],
         "@id": _versionId,
         "hasVersion": version,
-        "title": title,
-        "abstract": abstract,
-        "description": description,
+        "title": artifact_title,
+        "abstract": artifact_abstract,
+        "description": artifact_description,
         "license": license_url,
         "distribution": distribution_list,
     }

Update all callers in deploy_from_metadata(), databusclient/cli.py, and tests/test_deploy.py to use the new parameter names.

Update the docstring (lines 282-315) to reflect the new parameter names.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aca668e and 3838bdb.

📒 Files selected for processing (2)
  • databusclient/api/deploy.py
  • databusclient/cli.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • databusclient/cli.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
databusclient/api/deploy.py (1)

470-508: Docstring semantics conflict with parameter naming.

The docstrings correctly describe the intended semantics per the PR:

  • artifact_title: "permanent name of the data series (applies to all versions)"
  • artifact_abstract: "Version Abstract: ... specific to this timestamped release"
  • artifact_description: "Version Description: ... for this specific release"

However, the parameter names artifact_abstract and artifact_description are misleading given they're described as version-specific. Consider either:

  1. Renaming to version_abstract/version_description to match the docstring semantics, or
  2. Updating the docstrings if these are indeed meant to be artifact-level metadata
databusclient/cli.py (2)

100-102: Consider using keyword arguments for consistency with Mode 1.

Mode 1 explicitly maps CLI parameters to API parameters using keyword arguments (lines 85-90), but Mode 2 uses positional arguments. For consistency and clarity, consider using keyword arguments here as well.

✨ Suggested refactor for consistency
         api_deploy.deploy_from_metadata(
-            metadata, version_id, title, abstract, description, license_url, apikey
+            metadata=metadata,
+            version_id=version_id,
+            artifact_title=title,
+            artifact_abstract=abstract,
+            artifact_description=description,
+            license_url=license_url,
+            apikey=apikey,
         )

122-124: Same consistency suggestion applies here.

Consider using keyword arguments for consistency with Mode 1.

✨ Suggested refactor for consistency
         api_deploy.deploy_from_metadata(
-            metadata, version_id, title, abstract, description, license_url, apikey
+            metadata=metadata,
+            version_id=version_id,
+            artifact_title=title,
+            artifact_abstract=abstract,
+            artifact_description=description,
+            license_url=license_url,
+            apikey=apikey,
         )
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3838bdb and a1e1449.

📒 Files selected for processing (2)
  • databusclient/api/deploy.py
  • databusclient/cli.py
🔇 Additional comments (5)
databusclient/api/deploy.py (3)

269-311: Parameter renaming looks good, but docstrings could be more precise.

The parameter renaming from title/abstract/description to artifact_title/artifact_abstract/artifact_description is cleanly implemented. The docstrings at lines 291-296 could be slightly improved to clarify that these values are applied to both the Artifact and Version graphs (as seen in the implementation below).


380-387: LGTM!

The artifact graph correctly uses the renamed parameters for its metadata fields.


391-400: Verify semantic intent: Version graph uses artifact-named parameters.

The Version/Dataset graph uses artifact_abstract and artifact_description, but the PR's CLI help text describes these as "Version Abstract" and "Version Description" (specific to the timestamped release). The current implementation assigns the same values to both the Artifact graph (lines 383-385) and the Version graph (lines 395-397).

If the intent is that abstract/description are truly version-specific while title is artifact-level, then:

  1. The parameter names should reflect this (e.g., keep artifact_title but use version_abstract, version_description)
  2. Or the Artifact graph shouldn't include abstract/description

Please confirm this is the intended behavior for the Databus data model.

databusclient/cli.py (2)

28-30: Help text updates align well with PR objectives.

The updated help strings clearly communicate the semantic distinction:

  • --title: Artifact-level (permanent, applies to all versions)
  • --abstract: Version-level (specific to this release, max 200 chars)
  • --description: Version-level (detailed, supports Markdown)

84-91: LGTM!

The explicit keyword argument mapping is clear and correctly connects CLI parameter names to the renamed API parameters.

@vaibhav45sktech
Copy link
Author

As you said, it is a good idea to keep the existing CLI arguments for now to avoid breaking changes. However, perhaps you could change the variable names in the method create_dataset in the deploy.py file:

  • title -> artifact_title
  • abstract -> artifact_abstract
  • description -> artifact_description

This would improve clarity.

Greetings @Integer-Ctrl , Thanks for the feedback! I have updated deploy.py to rename the variables as suggested (title -> artifact_title, etc.) and updated the internal logic to match.

I also updated the function calls in cli.py to correctly map the CLI arguments to these new parameter names. Additionally, I changed the help text from Artifact Label to Artifact Title as requested.

Copy link
Contributor

@Integer-Ctrl Integer-Ctrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have commented on the issue #21

TL;DR: Either

  • Share your opinion on the topic
  • Change descriptions/help texts/docstrings to something like "Artifact & Version..."
  • Keep as it is -> cancel PR

@vaibhav45sktech
Copy link
Author

I have commented on the issue #21

TL;DR: Either

  • Share your opinion on the topic
  • Change descriptions/help texts/docstrings to something like "Artifact & Version..."
  • Keep as it is -> cancel PR

@Integer-Ctrl Thank you for the feedback on Issue #21!

I understand from your comment that you want the help text to clarify the dual nature of these parameters (they apply to both Artifact AND Version levels).

Just to confirm before I implement the changes: Would you like me to update the help text to use a format like:

Artifact & Version Title: the permanent name of the data series (applies to all versions)
Artifact & Version Abstract: a short summary (max 200 chars) specific to this timestamped release
Artifact & Version Description: detailed metadata for this specific release (supports Markdown)

Or would you prefer a different format to make the Artifact vs Version distinction clearer?

I want to make sure I implement exactly what you're looking for.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants