-
Notifications
You must be signed in to change notification settings - Fork 1
feat(dataset-api): extend upload_dataset and add upload_dataset_parts #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request extends the Dataset API functionality by adding support for tags and additional metadata to dataset uploads, and introduces a new upload_dataset_parts method for adding parts to existing datasets. The PR also includes comprehensive code formatting improvements, primarily reorganizing imports to follow PEP 8 standards (stdlib, third-party, local) and removing trailing whitespace.
Key Changes
- Extended
upload_datasetmethod to accept optionaltagsandadditional_dataparameters - Added new
upload_dataset_partsmethod to upload parts to existing datasets with optional replacement logic - Reorganized imports across the codebase to follow PEP 8 conventions (stdlib → third-party → local)
Reviewed changes
Copilot reviewed 65 out of 72 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| cosmotech/coal/cosmotech_api/apis/dataset.py | Extended upload_dataset with tags/additional_data parameters; added upload_dataset_parts method |
| tests/unit/coal/test_cosmotech_api/test_apis/test_dataset.py | Added comprehensive test coverage for new dataset API features |
| cosmotech/translation/coal/en-US/coal/services/dataset.yml | Added translation strings for new dataset parts operations |
| cosmotech/coal/store/init.py | Reorganized imports (with duplicate import issue) |
| Multiple tutorial files | Reorganized imports to follow PEP 8 conventions |
| Multiple test files | Reorganized imports, removed unused imports and variables |
| cosmotech/coal/azure/adx/ingestion.py | Consolidated typing imports into single line |
| cosmotech/coal/azure/adx/init.py | Reorganized exports alphabetically |
| generate_test_files.py | Removed redundant pass statement after comment |
| Documentation files | Removed trailing whitespace |
| Configuration files | Minor formatting fixes (trailing whitespace, newlines) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.create_dataset_part( | ||
| organization_id=self.configuration.cosmotech.organization_id, | ||
| workspace_id=self.configuration.cosmotech.workspace_id, | ||
| dataset_id=dataset_id, | ||
| dataset_part_create_request=part_request, | ||
| file=(_p_name, _p_path.open("rb").read()), | ||
| ) |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file handle opened by _p_path.open("rb") is not properly closed. This can lead to resource leaks, especially when uploading multiple files. Consider using a context manager to ensure proper file closure.
| self.create_dataset_part( | |
| organization_id=self.configuration.cosmotech.organization_id, | |
| workspace_id=self.configuration.cosmotech.workspace_id, | |
| dataset_id=dataset_id, | |
| dataset_part_create_request=part_request, | |
| file=(_p_name, _p_path.open("rb").read()), | |
| ) | |
| with _p_path.open("rb") as _f: | |
| self.create_dataset_part( | |
| organization_id=self.configuration.cosmotech.organization_id, | |
| workspace_id=self.configuration.cosmotech.workspace_id, | |
| dataset_id=dataset_id, | |
| dataset_part_create_request=part_request, | |
| file=(_p_name, _f.read()), | |
| ) |
| from cosmotech.coal.store.pandas import ( | ||
| store_dataframe, | ||
| convert_store_table_to_dataframe as convert_store_table_to_pandas_dataframe, | ||
| ) | ||
|
|
||
| # Re-export functions from the pyarrow module (if available) | ||
|
|
||
| from cosmotech.coal.store.pandas import ( | ||
| store_dataframe, | ||
| ) | ||
| from cosmotech.coal.store.pyarrow import ( | ||
| store_table, | ||
| convert_store_table_to_dataframe as convert_store_table_to_pyarrow_table, | ||
| ) | ||
| from cosmotech.coal.store.pyarrow import ( | ||
| store_table, | ||
| ) | ||
|
|
||
| # Re-export the Store class | ||
| from cosmotech.coal.store.store import Store | ||
|
|
||
| # Re-export functions from the pandas module (if available) | ||
|
|
||
|
|
||
| # Re-export functions from the pyarrow module (if available) |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The imports from pandas and pyarrow modules are duplicated. Lines 26-31 import from pandas, and lines 32-37 import from pyarrow, but then lines 42-45 have redundant comments suggesting these imports should go there. The duplicate import blocks (lines 26-31 and 29-31 for pandas; lines 32-34 and 35-37 for pyarrow) should be consolidated into single import blocks.
| file1 = Path(tmpdir) / "file1.csv" | ||
| file1.write_text("data1") | ||
|
|
||
| result = api.upload_dataset_parts("existing-dataset-123", as_files=[str(file1)]) |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable result is not used.
| file1 = Path(tmpdir) / "file1.csv" | ||
| file1.write_text("data1") | ||
|
|
||
| result = api.upload_dataset_parts("existing-dataset-123", as_files=[str(file1)]) |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable result is not used.
| file1 = Path(tmpdir) / "file1.csv" | ||
| file1.write_text("data1") | ||
|
|
||
| result = api.upload_dataset_parts("existing-dataset-123", as_files=[str(file1)], replace_existing=True) |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable result is not used.
| from azure.storage.blob import BlobServiceClient, ContainerClient | ||
|
|
||
| from cosmotech.coal.azure.blob import dump_store_to_azure, VALID_TYPES | ||
| from cosmotech.coal.azure.blob import VALID_TYPES, dump_store_to_azure |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'VALID_TYPES' is not used.
|
|
||
| import pyarrow as pa | ||
| import pandas as pd | ||
| import pyarrow as pa |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'pa' is not used.
| import pyarrow as pa | ||
| import pandas as pd | ||
| import pyarrow as pa | ||
| import pytest |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'pytest' is not used.
| from unittest.mock import MagicMock, patch | ||
|
|
||
| import pyarrow as pa | ||
| import pytest |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'pytest' is not used.
| from unittest.mock import MagicMock | ||
|
|
||
| import pyarrow as pa | ||
| import pytest |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'pytest' is not used.
No description provided.