Skip to content

Conversation

@etnikatcosmotech
Copy link

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request extends the Dataset API functionality by adding support for tags and additional metadata to dataset uploads, and introduces a new upload_dataset_parts method for adding parts to existing datasets. The PR also includes comprehensive code formatting improvements, primarily reorganizing imports to follow PEP 8 standards (stdlib, third-party, local) and removing trailing whitespace.

Key Changes

  • Extended upload_dataset method to accept optional tags and additional_data parameters
  • Added new upload_dataset_parts method to upload parts to existing datasets with optional replacement logic
  • Reorganized imports across the codebase to follow PEP 8 conventions (stdlib → third-party → local)

Reviewed changes

Copilot reviewed 65 out of 72 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
cosmotech/coal/cosmotech_api/apis/dataset.py Extended upload_dataset with tags/additional_data parameters; added upload_dataset_parts method
tests/unit/coal/test_cosmotech_api/test_apis/test_dataset.py Added comprehensive test coverage for new dataset API features
cosmotech/translation/coal/en-US/coal/services/dataset.yml Added translation strings for new dataset parts operations
cosmotech/coal/store/init.py Reorganized imports (with duplicate import issue)
Multiple tutorial files Reorganized imports to follow PEP 8 conventions
Multiple test files Reorganized imports, removed unused imports and variables
cosmotech/coal/azure/adx/ingestion.py Consolidated typing imports into single line
cosmotech/coal/azure/adx/init.py Reorganized exports alphabetically
generate_test_files.py Removed redundant pass statement after comment
Documentation files Removed trailing whitespace
Configuration files Minor formatting fixes (trailing whitespace, newlines)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +180 to +186
self.create_dataset_part(
organization_id=self.configuration.cosmotech.organization_id,
workspace_id=self.configuration.cosmotech.workspace_id,
dataset_id=dataset_id,
dataset_part_create_request=part_request,
file=(_p_name, _p_path.open("rb").read()),
)
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file handle opened by _p_path.open("rb") is not properly closed. This can lead to resource leaks, especially when uploading multiple files. Consider using a context manager to ensure proper file closure.

Suggested change
self.create_dataset_part(
organization_id=self.configuration.cosmotech.organization_id,
workspace_id=self.configuration.cosmotech.workspace_id,
dataset_id=dataset_id,
dataset_part_create_request=part_request,
file=(_p_name, _p_path.open("rb").read()),
)
with _p_path.open("rb") as _f:
self.create_dataset_part(
organization_id=self.configuration.cosmotech.organization_id,
workspace_id=self.configuration.cosmotech.workspace_id,
dataset_id=dataset_id,
dataset_part_create_request=part_request,
file=(_p_name, _f.read()),
)

Copilot uses AI. Check for mistakes.
Comment on lines 26 to +45
from cosmotech.coal.store.pandas import (
store_dataframe,
convert_store_table_to_dataframe as convert_store_table_to_pandas_dataframe,
)

# Re-export functions from the pyarrow module (if available)

from cosmotech.coal.store.pandas import (
store_dataframe,
)
from cosmotech.coal.store.pyarrow import (
store_table,
convert_store_table_to_dataframe as convert_store_table_to_pyarrow_table,
)
from cosmotech.coal.store.pyarrow import (
store_table,
)

# Re-export the Store class
from cosmotech.coal.store.store import Store

# Re-export functions from the pandas module (if available)


# Re-export functions from the pyarrow module (if available)
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imports from pandas and pyarrow modules are duplicated. Lines 26-31 import from pandas, and lines 32-37 import from pyarrow, but then lines 42-45 have redundant comments suggesting these imports should go there. The duplicate import blocks (lines 26-31 and 29-31 for pandas; lines 32-34 and 35-37 for pyarrow) should be consolidated into single import blocks.

Copilot uses AI. Check for mistakes.
file1 = Path(tmpdir) / "file1.csv"
file1.write_text("data1")

result = api.upload_dataset_parts("existing-dataset-123", as_files=[str(file1)])
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable result is not used.

Copilot uses AI. Check for mistakes.
file1 = Path(tmpdir) / "file1.csv"
file1.write_text("data1")

result = api.upload_dataset_parts("existing-dataset-123", as_files=[str(file1)])
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable result is not used.

Copilot uses AI. Check for mistakes.
file1 = Path(tmpdir) / "file1.csv"
file1.write_text("data1")

result = api.upload_dataset_parts("existing-dataset-123", as_files=[str(file1)], replace_existing=True)
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable result is not used.

Copilot uses AI. Check for mistakes.
from azure.storage.blob import BlobServiceClient, ContainerClient

from cosmotech.coal.azure.blob import dump_store_to_azure, VALID_TYPES
from cosmotech.coal.azure.blob import VALID_TYPES, dump_store_to_azure
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'VALID_TYPES' is not used.

Copilot uses AI. Check for mistakes.

import pyarrow as pa
import pandas as pd
import pyarrow as pa
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'pa' is not used.

Copilot uses AI. Check for mistakes.
import pyarrow as pa
import pandas as pd
import pyarrow as pa
import pytest
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'pytest' is not used.

Copilot uses AI. Check for mistakes.
from unittest.mock import MagicMock, patch

import pyarrow as pa
import pytest
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'pytest' is not used.

Copilot uses AI. Check for mistakes.
from unittest.mock import MagicMock

import pyarrow as pa
import pytest
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'pytest' is not used.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants