-
Notifications
You must be signed in to change notification settings - Fork 12
docs: Clarify CLI help text to align with databus hierarchy #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: Clarify CLI help text to align with databus hierarchy #47
Conversation
📝 WalkthroughWalkthroughHelp text for the deploy command's Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
databusclient/cli.py (2)
14-56: Consider several improvements to the parsing logic.
Warning mechanism (Line 49): Using
print()for warnings writes directly to stdout, which is not ideal for library code. Consider using Python'sloggingmodule or collecting warnings in the return value for better testability and control.Compression heuristic is fragile (Lines 36-39): The classification of
.taras compression is questionable since.taris often treated as an archive format, not a compression format. Files like.tar.gzhave both archiving and compression. This heuristic may misclassify extensions and lead to incorrect metadata.Missing docstring examples: The docstring should include concrete examples showing input and expected output to clarify the expected format and behavior.
No validation of key=value pairs (Line 43): The code splits on
'='but doesn't validate that the key or value is non-empty. Consider adding validation to reject malformed variants like"=value"or"key=".Inconsistent extension handling: Line 37's comment suggests conditional removal of the leading dot ("if needed"), but the code always strips it. Clarify the expected format or document why the leading dot is always removed.
🔎 Suggested improvements
Replace
print()with proper logging:+import logging + +logger = logging.getLogger(__name__) + def parse_distribution_str(dist_str: str): """ Parses a distribution string with format: URL|key=value|...|.extension Returns a dictionary suitable for the deploy API. + + Examples: + >>> parse_distribution_str("http://example.com/data.json|lang=en|.json|.gz") + {'url': 'http://example.com/data.json', 'variants': {'lang': 'en'}, + 'formatExtension': 'json', 'compression': 'gz'} """ ... else: - print(f"WARNING: Unrecognized modifier '{part}' in distribution. Expected '.ext' or 'key=val'.") + logger.warning(f"Unrecognized modifier '{part}' in distribution. Expected '.ext' or 'key=val'.")Add validation for key=value pairs:
elif '=' in part: key, value = part.split('=', 1) + if not key.strip() or not value.strip(): + logger.warning(f"Invalid key=value pair '{part}' in distribution.") + continue variants[key.strip()] = value.strip()
129-137: Remove developer scaffolding comments and consider error handling.
Lines 129, 137: The
"--- CHANGE START/END ---"comments are developer scaffolding and should be removed before merging.Line 131: Consider adding error handling around
parse_distribution_strcalls. If parsing fails (e.g., malformed distribution string), the user should receive a clear error message rather than an uncaught exception.Behavioral change: This modification changes the contract between the CLI and
api_deploy.create_dataset. Previously, raw strings were passed; now parsed dictionaries are passed. While this appears to be intentional and supported by changes indeploy.py, it's worth verifying that all downstream consumers can handle this change.🔎 Proposed cleanup
- # --- CHANGE START --- - # Parse the input strings into structured objects parsed_distributions = [parse_distribution_str(d) for d in distributions] - # Note: api_deploy.create_dataset now accepts this list of dicts dataid = api_deploy.create_dataset( version_id, title, abstract, description, license_url, parsed_distributions ) - # --- CHANGE END ---databusclient/api/queries.py (1)
54-93: Consider adding validation for malformed key=value pairs.Similar to
parse_distribution_strincli.py, this function splits on'='(Line 87) but doesn't validate that both key and value are non-empty. Consider adding a check to handle edge cases like"=value"or"key=".🔎 Suggested validation
if "=" in part: key, value = part.split("=", 1) + if not key.strip() or not value.strip(): + # Skip malformed key=value pairs + continue variants[key.strip()] = value.strip()tests/test_parse_distribution.py (1)
1-275: Excellent test coverage with minor gaps.The test suite is comprehensive and well-structured:
- Clear test organization with descriptive test names
- Good coverage of parsing logic: URL extraction, variants, extensions, compression
- Appropriate use of mocking to avoid network dependencies
- Integration tests validate end-to-end behavior with the deploy API
Minor suggestions for additional test coverage:
Error handling: Consider adding tests for truly malformed inputs that might cause exceptions (e.g., None input, non-string input, extremely long strings).
Edge case - URLs with delimiter: Test behavior when the URL itself contains the
'|'delimiter character (e.g.,"http://example.com/file?param=a|b|.json"). The current implementation would split incorrectly.Empty parts: Test distribution strings with empty parts between delimiters (e.g.,
"http://example.com/file||.json"with double pipe).🔎 Example additional tests
def test_url_with_pipe_character(self): """Test URL containing pipe character in query string.""" # This is an edge case that might need special handling result = parse_distribution_str("http://example.com/file?param=a|b") # Current implementation would incorrectly split on the pipe in the URL assert result["url"] == "http://example.com/file?param=a|b" def test_empty_parts_between_delimiters(self): """Test handling of empty parts between delimiters.""" result = parse_distribution_str("http://example.com/file||.json") # Should gracefully handle empty parts assert result["formatExtension"] == "json"
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
databusclient/api/deploy.pydatabusclient/api/queries.pydatabusclient/cli.pytests/test_deploy.pytests/test_parse_distribution.py
🧰 Additional context used
🧬 Code graph analysis (2)
databusclient/cli.py (1)
databusclient/api/deploy.py (1)
create_dataset(304-465)
tests/test_parse_distribution.py (2)
databusclient/cli.py (1)
parse_distribution_str(14-56)databusclient/api/deploy.py (1)
_get_file_info_from_dict(176-208)
🔇 Additional comments (4)
databusclient/cli.py (1)
73-75: LGTM! Help text improvements align with PR objectives.The updated help text successfully clarifies the distinction between Artifact-level metadata (permanent name) and Version-level metadata (release-specific). The terminology now explicitly maps to the DBpedia Databus data model.
databusclient/api/deploy.py (1)
304-387: LGTM! Backwards-compatible API extension.The extension of
distributionsparameter to acceptUnion[List[str], List[Dict]]is well-implemented:
- Backwards compatible with existing string-based distributions
- Clear documentation in the docstring (lines 334-337)
- Appropriate use of
isinstance()check to branch on format (line 368)- Both branches produce the same tuple structure for consistent downstream processing
The approach correctly maintains the existing validation (lines 389-393) for both formats.
tests/test_deploy.py (1)
15-16: LGTM! Good test hygiene.Extracting the example URL into a named constant improves test maintainability and readability.
databusclient/api/queries.py (1)
1-51: This module introduces unused code unrelated to the PR's stated objective.
ONTOLOGIES_QUERYandparse_content_variants_string()are defined in this new module but are not imported or used anywhere in the codebase. Since the PR aims to clarify CLI help text, this feature addition appears out of scope.Either:
- Move this to a separate PR with proper documentation of the feature's purpose and integration points, or
- Integrate it into the codebase (add imports where needed and update the PR description to reflect this feature work)
Additionally, document why these functions are being added if they're intended for future use.
c6316e2 to
aca668e
Compare
|
Greetings @Integer-Ctrl could you please review this pr ? |
Description-
This PR updates the CLI help text for
--title,--abstract, and--descriptionto align with the DBpedia databus data model hierarchy.Currently, the CLI uses generic "Dataset" terminology, which causes confusion between Artifacts (the logical container) and Versions (the specific release). Users often mistake
--titlefor a version-specific name, or--abstractfor a global description.These changes clarify the scope of each argument without breaking existing CLI compatibility.
Changes
I have updated the help strings to explicitly map the arguments to their databus concepts:
--titleDataset titleArtifact Label: the permanent name of the data series (applies to all versions)--abstractDataset abstract max 200 charsVersion Abstract: a short summary (max 200 chars) specific to this timestamped release--descriptionDataset descriptionVersion Description: detailed metadata for this specific release (supports Markdown)Solves Issue #21