Skip to content

Conversation

@vaibhav45sktech
Copy link

@vaibhav45sktech vaibhav45sktech commented Jan 4, 2026

Description-

This PR updates the CLI help text for --title, --abstract, and --description to align with the DBpedia databus data model hierarchy.

Currently, the CLI uses generic "Dataset" terminology, which causes confusion between Artifacts (the logical container) and Versions (the specific release). Users often mistake --title for a version-specific name, or --abstract for a global description.

These changes clarify the scope of each argument without breaking existing CLI compatibility.

Changes

I have updated the help strings to explicitly map the arguments to their databus concepts:

Argument Old Help Text New Help Text
--title Dataset title Artifact Label: the permanent name of the data series (applies to all versions)
--abstract Dataset abstract max 200 chars Version Abstract: a short summary (max 200 chars) specific to this timestamped release
--description Dataset description Version Description: detailed metadata for this specific release (supports Markdown)

Solves Issue #21

@coderabbitai
Copy link

coderabbitai bot commented Jan 4, 2026

📝 Walkthrough

Walkthrough

Help text for the deploy command's --title, --abstract, and --description options has been updated to clarify their scope and usage. The --title option now describes itself as an "Artifact Label," --abstract as a "Version Abstract," and --description as "Version Description" with clarifications about their applicability across versions.

Changes

Cohort / File(s) Summary
Deploy command help text updates
databusclient/cli.py
Updated help strings for --title, --abstract, and --description options to clarify their scope (artifact-level vs. version-level) and content guidelines

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'docs: Clarify CLI help text to align with databus hierarchy' directly summarizes the main change—updating CLI help text for terminology clarity.
Docstring Coverage ✅ Passed Docstring coverage is 97.06% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description comprehensively covers all required sections with clear details about changes, rationale, and mapping to the Databus data model.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
databusclient/cli.py (2)

14-56: Consider several improvements to the parsing logic.

  1. Warning mechanism (Line 49): Using print() for warnings writes directly to stdout, which is not ideal for library code. Consider using Python's logging module or collecting warnings in the return value for better testability and control.

  2. Compression heuristic is fragile (Lines 36-39): The classification of .tar as compression is questionable since .tar is often treated as an archive format, not a compression format. Files like .tar.gz have both archiving and compression. This heuristic may misclassify extensions and lead to incorrect metadata.

  3. Missing docstring examples: The docstring should include concrete examples showing input and expected output to clarify the expected format and behavior.

  4. No validation of key=value pairs (Line 43): The code splits on '=' but doesn't validate that the key or value is non-empty. Consider adding validation to reject malformed variants like "=value" or "key=".

  5. Inconsistent extension handling: Line 37's comment suggests conditional removal of the leading dot ("if needed"), but the code always strips it. Clarify the expected format or document why the leading dot is always removed.

🔎 Suggested improvements

Replace print() with proper logging:

+import logging
+
+logger = logging.getLogger(__name__)
+
 def parse_distribution_str(dist_str: str):
     """
     Parses a distribution string with format:
     URL|key=value|...|.extension
     
     Returns a dictionary suitable for the deploy API.
+    
+    Examples:
+        >>> parse_distribution_str("http://example.com/data.json|lang=en|.json|.gz")
+        {'url': 'http://example.com/data.json', 'variants': {'lang': 'en'}, 
+         'formatExtension': 'json', 'compression': 'gz'}
     """
     ...
         else:
-             print(f"WARNING: Unrecognized modifier '{part}' in distribution. Expected '.ext' or 'key=val'.")
+             logger.warning(f"Unrecognized modifier '{part}' in distribution. Expected '.ext' or 'key=val'.")

Add validation for key=value pairs:

         elif '=' in part:
             key, value = part.split('=', 1)
+            if not key.strip() or not value.strip():
+                logger.warning(f"Invalid key=value pair '{part}' in distribution.")
+                continue
             variants[key.strip()] = value.strip()

129-137: Remove developer scaffolding comments and consider error handling.

  1. Lines 129, 137: The "--- CHANGE START/END ---" comments are developer scaffolding and should be removed before merging.

  2. Line 131: Consider adding error handling around parse_distribution_str calls. If parsing fails (e.g., malformed distribution string), the user should receive a clear error message rather than an uncaught exception.

  3. Behavioral change: This modification changes the contract between the CLI and api_deploy.create_dataset. Previously, raw strings were passed; now parsed dictionaries are passed. While this appears to be intentional and supported by changes in deploy.py, it's worth verifying that all downstream consumers can handle this change.

🔎 Proposed cleanup
-        # --- CHANGE START ---
-        # Parse the input strings into structured objects
         parsed_distributions = [parse_distribution_str(d) for d in distributions]
         
-        # Note: api_deploy.create_dataset now accepts this list of dicts
         dataid = api_deploy.create_dataset(
             version_id, title, abstract, description, license_url, parsed_distributions
         )
-        # --- CHANGE END ---
databusclient/api/queries.py (1)

54-93: Consider adding validation for malformed key=value pairs.

Similar to parse_distribution_str in cli.py, this function splits on '=' (Line 87) but doesn't validate that both key and value are non-empty. Consider adding a check to handle edge cases like "=value" or "key=".

🔎 Suggested validation
         if "=" in part:
             key, value = part.split("=", 1)
+            if not key.strip() or not value.strip():
+                # Skip malformed key=value pairs
+                continue
             variants[key.strip()] = value.strip()
tests/test_parse_distribution.py (1)

1-275: Excellent test coverage with minor gaps.

The test suite is comprehensive and well-structured:

  • Clear test organization with descriptive test names
  • Good coverage of parsing logic: URL extraction, variants, extensions, compression
  • Appropriate use of mocking to avoid network dependencies
  • Integration tests validate end-to-end behavior with the deploy API

Minor suggestions for additional test coverage:

  1. Error handling: Consider adding tests for truly malformed inputs that might cause exceptions (e.g., None input, non-string input, extremely long strings).

  2. Edge case - URLs with delimiter: Test behavior when the URL itself contains the '|' delimiter character (e.g., "http://example.com/file?param=a|b|.json"). The current implementation would split incorrectly.

  3. Empty parts: Test distribution strings with empty parts between delimiters (e.g., "http://example.com/file||.json" with double pipe).

🔎 Example additional tests
def test_url_with_pipe_character(self):
    """Test URL containing pipe character in query string."""
    # This is an edge case that might need special handling
    result = parse_distribution_str("http://example.com/file?param=a|b")
    # Current implementation would incorrectly split on the pipe in the URL
    assert result["url"] == "http://example.com/file?param=a|b"

def test_empty_parts_between_delimiters(self):
    """Test handling of empty parts between delimiters."""
    result = parse_distribution_str("http://example.com/file||.json")
    # Should gracefully handle empty parts
    assert result["formatExtension"] == "json"
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2c3f1c and c6316e2.

📒 Files selected for processing (5)
  • databusclient/api/deploy.py
  • databusclient/api/queries.py
  • databusclient/cli.py
  • tests/test_deploy.py
  • tests/test_parse_distribution.py
🧰 Additional context used
🧬 Code graph analysis (2)
databusclient/cli.py (1)
databusclient/api/deploy.py (1)
  • create_dataset (304-465)
tests/test_parse_distribution.py (2)
databusclient/cli.py (1)
  • parse_distribution_str (14-56)
databusclient/api/deploy.py (1)
  • _get_file_info_from_dict (176-208)
🔇 Additional comments (4)
databusclient/cli.py (1)

73-75: LGTM! Help text improvements align with PR objectives.

The updated help text successfully clarifies the distinction between Artifact-level metadata (permanent name) and Version-level metadata (release-specific). The terminology now explicitly maps to the DBpedia Databus data model.

databusclient/api/deploy.py (1)

304-387: LGTM! Backwards-compatible API extension.

The extension of distributions parameter to accept Union[List[str], List[Dict]] is well-implemented:

  • Backwards compatible with existing string-based distributions
  • Clear documentation in the docstring (lines 334-337)
  • Appropriate use of isinstance() check to branch on format (line 368)
  • Both branches produce the same tuple structure for consistent downstream processing

The approach correctly maintains the existing validation (lines 389-393) for both formats.

tests/test_deploy.py (1)

15-16: LGTM! Good test hygiene.

Extracting the example URL into a named constant improves test maintainability and readability.

databusclient/api/queries.py (1)

1-51: This module introduces unused code unrelated to the PR's stated objective.

ONTOLOGIES_QUERY and parse_content_variants_string() are defined in this new module but are not imported or used anywhere in the codebase. Since the PR aims to clarify CLI help text, this feature addition appears out of scope.

Either:

  1. Move this to a separate PR with proper documentation of the feature's purpose and integration points, or
  2. Integrate it into the codebase (add imports where needed and update the PR description to reflect this feature work)

Additionally, document why these functions are being added if they're intended for future use.

@vaibhav45sktech vaibhav45sktech force-pushed the docs/clarify-cli-terminology branch from c6316e2 to aca668e Compare January 4, 2026 08:05
@vaibhav45sktech
Copy link
Author

vaibhav45sktech commented Jan 6, 2026

Greetings @Integer-Ctrl could you please review this pr ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant