Skip to content

COM: Column profiling stats in upload#247

Open
meghanagottapu wants to merge 2 commits intoc2siorg:mainfrom
meghanagottapu:feature/column_profiling
Open

COM: Column profiling stats in upload#247
meghanagottapu wants to merge 2 commits intoc2siorg:mainfrom
meghanagottapu:feature/column_profiling

Conversation

@meghanagottapu
Copy link
Copy Markdown

Description

After uploading a CSV, users currently see a raw data table with no statistical context. To understand the shape of their data, how many nulls a column has, what its range is, and what the most common values are, they have to export the file and open it in a separate tool.

This PR adds automatic column profiling to the upload flow and to GET /projects/{id}. No new endpoint is added; the profile is embedded in the existing ProjectResponse, so the frontend receives it in the same request that loads the data table, at zero extra cost.

Fixes #246

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update

How Has This Been Tested?

Two new test files - 50 tests total across unit and integration-style layers.
tests/test_profiling_service.py (27 unit tests) and tests/test_profile_endpoint.py (23 service-layer tests)

  • Existing tests pass
  • New tests added
  • Manual testing

Screenshots (if applicable)

image

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review
  • I have added/updated documentation as needed
  • My changes generate no new warnings
  • Tests pass locally

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 30, 2026

PR Review

Squash

Your PR has 2 commits. Please squash into a single commit.

How to fix

git fetch origin
git rebase -i origin/main   # mark all but first commit as "squash"
git push --force-with-lease

This comment updates automatically on each push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Automatic data profiling gives instant insight into column statistics on upload

1 participant