Skip to content

fix(reports): respect CSV_EXPORT sep and decimal config in email reports#38616

Open
nitishagar wants to merge 2 commits intoapache:masterfrom
nitishagar:fix-36682
Open

fix(reports): respect CSV_EXPORT sep and decimal config in email reports#38616
nitishagar wants to merge 2 commits intoapache:masterfrom
nitishagar:fix-36682

Conversation

@nitishagar
Copy link
Contributor

@nitishagar nitishagar commented Mar 13, 2026

User description

SUMMARY

When CSV_EXPORT config uses a non-default delimiter (e.g. sep=";") and decimal separator (e.g. decimal=","), GUI CSV exports work correctly because they go through query_context_processor.py which passes **current_app.config["CSV_EXPORT"] to df_to_escaped_csv(). However, CSV email reports take a different path through apply_client_processing() in client_processing.py, which called pd.read_csv(StringIO(data)) without reading sep/decimal from the config.

This caused pandas to misparse the semicolon-delimited CSV (treating the whole row as one column), which then broke the pivot/table post-processor and returned an HTTP 500.

Fix: Read sep and decimal from current_app.config["CSV_EXPORT"] and pass them to pd.read_csv(), defaulting to "," and "." respectively when not configured.

Fixes #36682

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A — backend-only fix.

TESTING INSTRUCTIONS

  1. Configure CSV_EXPORT = {"sep": ";", "decimal": ","} in superset_config.py
  2. Create a chart and schedule a CSV email report
  3. Before fix: report fails with HTTP 500 (pandas misparsed the semicolon-delimited data)
  4. After fix: report processes correctly and sends valid CSV email

Unit test added:

pytest tests/unit_tests/charts/test_client_processing.py::test_apply_client_processing_csv_format_custom_delimiter -v

ADDITIONAL INFORMATION


CodeAnt-AI Description

Respect CSV_EXPORT delimiter and decimal settings for scheduled CSV email reports

What Changed

  • CSV email report processing now reads the configured CSV delimiter and decimal separator so semicolon-delimited files and comma-as-decimal values parse correctly
  • The processed CSV columns and numeric values are preserved in email reports instead of being merged into a single column or misparsed
  • Added a unit test that verifies semicolon-delimited CSV with comma decimals is parsed into separate columns

Impact

✅ Fewer scheduled CSV email failures (no HTTP 500 from misparsed CSV)
✅ Correct column parsing in CSV email attachments
✅ Correct numeric parsing when decimal separator is configured

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@bito-code-review
Copy link
Contributor

bito-code-review bot commented Mar 13, 2026

Code Review Agent Run #c83433

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 3db389b..3db389b
    • superset/charts/client_processing.py
    • tests/unit_tests/charts/test_client_processing.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@dosubot dosubot bot added the data:csv Related to import/export of CSVs label Mar 13, 2026
@codeant-ai-for-open-source codeant-ai-for-open-source bot added the size:M This PR changes 30-99 lines, ignoring generated files label Mar 13, 2026
@codeant-ai-for-open-source
Copy link
Contributor

Sequence Diagram

This PR updates CSV email report processing to use the configured CSV export separator and decimal settings. It ensures semicolon-delimited and comma-decimal data is parsed correctly before post-processing, preventing report failures.

sequenceDiagram
    participant ReportJob
    participant ClientProcessing
    participant AppConfig
    participant Pandas
    participant PostProcessor
    participant EmailService

    ReportJob->>ClientProcessing: Process query result in CSV format
    ClientProcessing->>AppConfig: Read CSV export sep and decimal
    AppConfig-->>ClientProcessing: Return configured or default values
    ClientProcessing->>Pandas: Parse CSV using sep and decimal
    ClientProcessing->>PostProcessor: Run client side table processing
    PostProcessor-->>ReportJob: Return processed CSV output
    ReportJob->>EmailService: Send email report with CSV attachment
Loading

Generated by CodeAnt AI

Comment on lines +2827 to +2832
output_data = processed_result["queries"][0]["data"]
lines = output_data.strip().split("\n")
# Should have header + 2 data rows, with correct column parsing
assert len(lines) == 3
# name and value should be separate columns, not merged into one
assert processed_result["queries"][0]["colnames"] == ["name", "value"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This test does not actually verify that the configured delimiter/decimal are preserved in the generated CSV output, so it can pass even when email CSV exports are still emitted with default comma and dot formatting. Strengthen the assertions to validate the serialized CSV content and numeric typing of the parsed value column. [logic error]

Severity Level: Major ⚠️
- ❌ Scheduled CSV emails may ignore configured separator/decimal output.
- ⚠️ Current unit test misses CSV serialization regressions.
Suggested change
output_data = processed_result["queries"][0]["data"]
lines = output_data.strip().split("\n")
# Should have header + 2 data rows, with correct column parsing
assert len(lines) == 3
# name and value should be separate columns, not merged into one
assert processed_result["queries"][0]["colnames"] == ["name", "value"]
output_data = processed_result["queries"][0]["data"]
lines = output_data.strip().split("\n")
# Should have header + 2 data rows, with correct column parsing
assert len(lines) == 3
# name and value should be separate columns, not merged into one
assert processed_result["queries"][0]["colnames"] == ["name", "value"]
# value should be parsed as numeric using decimal="," config
assert processed_result["queries"][0]["coltypes"] == [
GenericDataType.STRING,
GenericDataType.NUMERIC,
]
# serialized CSV should honor configured separator/decimal for email exports
assert lines[0] == "name;value"
assert "foo;1,5" in lines[1]
assert "bar;2,0" in lines[2]
Steps of Reproduction ✅
1. Configure a report with non-default CSV settings
(`CSV_EXPORT={"sep":";","decimal":","}`), then trigger scheduled CSV generation through
`ReportScheduleStateMachine._get_csv_data()` in
`superset/commands/report/execute.py:439-443`, which calls `ChartDataRestApi.get_data`
with `type=post_processed` at `execute.py:226-230`.

2. In chart response handling, `_send_chart_response()` applies post-processing when
`result_type == POST_PROCESSED` (`superset/charts/data/api.py:404-405`), invoking
`apply_client_processing()`.

3. In `apply_client_processing()`, CSV is parsed with configured `sep`/`decimal`
(`superset/charts/client_processing.py:348-356`) but serialized back using default
`processed_df.to_csv(...)` without config (`client_processing.py:397`), yielding comma/dot
formatted output.

4. Run the current unit test `test_apply_client_processing_csv_format_custom_delimiter`
(`tests/unit_tests/charts/test_client_processing.py:2794-2832`): it only asserts row count
and `colnames` (`2827-2832`), so it still passes even when output CSV formatting is wrong.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/charts/test_client_processing.py
**Line:** 2827:2832
**Comment:**
	*Logic Error: This test does not actually verify that the configured delimiter/decimal are preserved in the generated CSV output, so it can pass even when email CSV exports are still emitted with default comma and dot formatting. Strengthen the assertions to validate the serialized CSV content and numeric typing of the parsed `value` column.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

# reports to avoid unwanted conversions
# This allows users to control which values should be treated as null/NA
na_values = current_app.config["REPORTS_CSV_NA_NAMES"]
csv_export_config = current_app.config.get("CSV_EXPORT", {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The CSV parser now respects configured delimiter/decimal, but the CSV writer still uses pandas defaults, so processed email report output is re-serialized with , and . instead of the configured format. This breaks the intended contract for localized CSV exports and can corrupt round-tripping for semicolon/comma-decimal setups. Reuse the same sep and decimal values when calling to_csv. [logic error]

Severity Level: Major ⚠️
- ⚠️ Scheduled CSV reports ignore configured output separators.
- ⚠️ Locale decimal formatting changes in emailed CSV files.
- ⚠️ Behavior diverges from GUI CSV export formatting.

When CSV_EXPORT config uses a non-default delimiter (e.g. sep=";") and
decimal separator (e.g. decimal=","), GUI exports worked correctly but
CSV email reports failed with HTTP 500 because apply_client_processing()
called pd.read_csv() without the sep/decimal params, causing pandas to
misparse semicolon-delimited CSV.

Fixes apache#36682
@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.39%. Comparing base (4a9db24) to head (b162f2b).
⚠️ Report is 24 commits behind head on master.

Files with missing lines Patch % Lines
superset/charts/client_processing.py 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #38616      +/-   ##
==========================================
- Coverage   65.01%   64.39%   -0.62%     
==========================================
  Files        1817     2529     +712     
  Lines       72318   128956   +56638     
  Branches    23032    29719    +6687     
==========================================
+ Hits        47016    83042   +36026     
- Misses      25302    44469   +19167     
- Partials        0     1445    +1445     
Flag Coverage Δ
hive 40.75% <0.00%> (?)
mysql 61.88% <0.00%> (?)
postgres 61.95% <0.00%> (?)
presto 40.77% <0.00%> (?)
python 63.58% <0.00%> (?)
sqlite 61.58% <0.00%> (?)
unit 100.00% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…d CSV output

The previous fix only passed sep and decimal to pd.read_csv() (parsing the
input CSV), but to_csv() was still called with pandas defaults, so email report
attachments were still serialized with ',' and '.' regardless of config.

Pass the same sep and decimal values to to_csv() so the round-trip is complete:
configured separator and decimal are preserved in the output CSV sent as email
attachment.

Also strengthen the test to assert that the output CSV uses the configured
separator and decimal, not just that column names were parsed correctly.

Addresses review feedback on apache#38616.
@bito-code-review
Copy link
Contributor

bito-code-review bot commented Mar 13, 2026

Code Review Agent Run #f1c406

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 6492ffa..b162f2b
    • superset/charts/client_processing.py
    • tests/unit_tests/charts/test_client_processing.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:csv Related to import/export of CSVs size/M size:M This PR changes 30-99 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CSV email reports fail with HTTP 500 when using non-default CSV delimiter (GUI CSV export works)

1 participant