Skip to content

Unify applied transform semantics: main table, working CSV, and GET reload (filter / sort / query / pivot) #238

@RachanaB5

Description

@RachanaB5

Problem

Some operations (filter, sort, advQueryFilter, pivotTables) return updated columns / rows from POST /projects/{id}/transform when should_save == false, so the working copy CSV may stay unchanged. Different UI flows also update the main table inconsistently (some operations push results into the grid via onTransform / context; others only show a preview or partial UI).

After a reload, GET /projects/get/{project_id} reads from disk, so users can see data that does not match what they thought they had applied—and CSV export (from the working file) can disagree with what they last saw in the app. None of this is spelled out as an intentional rule today.

Example (reproducible)

  1. Upload a CSV with a column you can filter on (e.g. status with values active / inactive).
  2. Open Filter, apply e.g. status = active.
  3. Observe the UI:
    • The filtered result may appear in a preview while the main table still shows all rows, or the table reflects the filter depending on the code path—either way the experience is inconsistent across transforms.
  4. Refresh the browser (full reload).
  5. GET /projects/get/{id} (what the app loads after refresh) returns the on-disk working copy — typically the full, unfiltered dataset for this class of operations.
  6. Export CSV — file matches the unfiltered working copy, not the last filtered view.

Expected (product decision, pick one and enforce):

  • Persist: Filter/sort/query/pivot (or a defined subset) are written to the working copy and survive reload and export; or
  • Ephemeral: They never touch disk, but the UI clearly states they are temporary and reload resets to disk state with no surprise.

Right now behavior sits in between, which is confusing and hard to combine with checkpoints/logs (#224, #166, #49).

Suspected cause

  • Transform results are applied inconsistently between frontend routes (preview-only vs updating shared table state).
  • should_save is not a single, user-visible contract: some ops skip persisting to the working CSV while the UI does not always say so.
  • No single source of truth: the “current dataset” is split between in-memory / preview state, GET from disk, and export—they can diverge after the same user action.

Proposal

Agree on one explicit contract (persist vs ephemeral vs hybrid with e.g. “Commit to dataset”), document it, then align backend (should_save, logging) and frontend (always update shared state + messaging) and add tests.

Goals

  • Main table behavior is consistent after Apply across filter, sort, advanced query, pivot (and matches the chosen contract).
  • Reload / GET and export match that contract and are documented.
  • PRs reference related checkpoint work where needed.

Non-goals

Suggested phases

  1. Audit: map each operation_typeshould_save → which components call onTransform / updateData.
  2. Maintainer/user-visible decision: persist vs ephemeral (short doc in README or CONTRIBUTING).
  3. Frontend: one consistent path from transform response → table state (+ clear copy if ephemeral).
  4. Backend: if persisting, adjust writes/logging; add integration coverage (Integration Tests for Transform Endpoint #91).
  5. Test: e.g. apply → reload → assert rows/columns per contract.

Suggested direction: Centralize transform application (one pipeline or single app-level dataset state) used by the main table, CSV export, and whatever GET reload hydrates—so they cannot silently disagree.

Acceptance criteria

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions