fix(reports): respect CSV_EXPORT sep and decimal config in email reports#38616
fix(reports): respect CSV_EXPORT sep and decimal config in email reports#38616nitishagar wants to merge 2 commits intoapache:masterfrom
Conversation
Code Review Agent Run #c83433Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Sequence DiagramThis PR updates CSV email report processing to use the configured CSV export separator and decimal settings. It ensures semicolon-delimited and comma-decimal data is parsed correctly before post-processing, preventing report failures. sequenceDiagram
participant ReportJob
participant ClientProcessing
participant AppConfig
participant Pandas
participant PostProcessor
participant EmailService
ReportJob->>ClientProcessing: Process query result in CSV format
ClientProcessing->>AppConfig: Read CSV export sep and decimal
AppConfig-->>ClientProcessing: Return configured or default values
ClientProcessing->>Pandas: Parse CSV using sep and decimal
ClientProcessing->>PostProcessor: Run client side table processing
PostProcessor-->>ReportJob: Return processed CSV output
ReportJob->>EmailService: Send email report with CSV attachment
Generated by CodeAnt AI |
| output_data = processed_result["queries"][0]["data"] | ||
| lines = output_data.strip().split("\n") | ||
| # Should have header + 2 data rows, with correct column parsing | ||
| assert len(lines) == 3 | ||
| # name and value should be separate columns, not merged into one | ||
| assert processed_result["queries"][0]["colnames"] == ["name", "value"] |
There was a problem hiding this comment.
Suggestion: This test does not actually verify that the configured delimiter/decimal are preserved in the generated CSV output, so it can pass even when email CSV exports are still emitted with default comma and dot formatting. Strengthen the assertions to validate the serialized CSV content and numeric typing of the parsed value column. [logic error]
Severity Level: Major ⚠️
- ❌ Scheduled CSV emails may ignore configured separator/decimal output.
- ⚠️ Current unit test misses CSV serialization regressions.| output_data = processed_result["queries"][0]["data"] | |
| lines = output_data.strip().split("\n") | |
| # Should have header + 2 data rows, with correct column parsing | |
| assert len(lines) == 3 | |
| # name and value should be separate columns, not merged into one | |
| assert processed_result["queries"][0]["colnames"] == ["name", "value"] | |
| output_data = processed_result["queries"][0]["data"] | |
| lines = output_data.strip().split("\n") | |
| # Should have header + 2 data rows, with correct column parsing | |
| assert len(lines) == 3 | |
| # name and value should be separate columns, not merged into one | |
| assert processed_result["queries"][0]["colnames"] == ["name", "value"] | |
| # value should be parsed as numeric using decimal="," config | |
| assert processed_result["queries"][0]["coltypes"] == [ | |
| GenericDataType.STRING, | |
| GenericDataType.NUMERIC, | |
| ] | |
| # serialized CSV should honor configured separator/decimal for email exports | |
| assert lines[0] == "name;value" | |
| assert "foo;1,5" in lines[1] | |
| assert "bar;2,0" in lines[2] |
Steps of Reproduction ✅
1. Configure a report with non-default CSV settings
(`CSV_EXPORT={"sep":";","decimal":","}`), then trigger scheduled CSV generation through
`ReportScheduleStateMachine._get_csv_data()` in
`superset/commands/report/execute.py:439-443`, which calls `ChartDataRestApi.get_data`
with `type=post_processed` at `execute.py:226-230`.
2. In chart response handling, `_send_chart_response()` applies post-processing when
`result_type == POST_PROCESSED` (`superset/charts/data/api.py:404-405`), invoking
`apply_client_processing()`.
3. In `apply_client_processing()`, CSV is parsed with configured `sep`/`decimal`
(`superset/charts/client_processing.py:348-356`) but serialized back using default
`processed_df.to_csv(...)` without config (`client_processing.py:397`), yielding comma/dot
formatted output.
4. Run the current unit test `test_apply_client_processing_csv_format_custom_delimiter`
(`tests/unit_tests/charts/test_client_processing.py:2794-2832`): it only asserts row count
and `colnames` (`2827-2832`), so it still passes even when output CSV formatting is wrong.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/charts/test_client_processing.py
**Line:** 2827:2832
**Comment:**
*Logic Error: This test does not actually verify that the configured delimiter/decimal are preserved in the generated CSV output, so it can pass even when email CSV exports are still emitted with default comma and dot formatting. Strengthen the assertions to validate the serialized CSV content and numeric typing of the parsed `value` column.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| # reports to avoid unwanted conversions | ||
| # This allows users to control which values should be treated as null/NA | ||
| na_values = current_app.config["REPORTS_CSV_NA_NAMES"] | ||
| csv_export_config = current_app.config.get("CSV_EXPORT", {}) |
There was a problem hiding this comment.
Suggestion: The CSV parser now respects configured delimiter/decimal, but the CSV writer still uses pandas defaults, so processed email report output is re-serialized with , and . instead of the configured format. This breaks the intended contract for localized CSV exports and can corrupt round-tripping for semicolon/comma-decimal setups. Reuse the same sep and decimal values when calling to_csv. [logic error]
Severity Level: Major ⚠️
- ⚠️ Scheduled CSV reports ignore configured output separators.
- ⚠️ Locale decimal formatting changes in emailed CSV files.
- ⚠️ Behavior diverges from GUI CSV export formatting.When CSV_EXPORT config uses a non-default delimiter (e.g. sep=";") and decimal separator (e.g. decimal=","), GUI exports worked correctly but CSV email reports failed with HTTP 500 because apply_client_processing() called pd.read_csv() without the sep/decimal params, causing pandas to misparse semicolon-delimited CSV. Fixes apache#36682
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #38616 +/- ##
==========================================
- Coverage 65.01% 64.39% -0.62%
==========================================
Files 1817 2529 +712
Lines 72318 128956 +56638
Branches 23032 29719 +6687
==========================================
+ Hits 47016 83042 +36026
- Misses 25302 44469 +19167
- Partials 0 1445 +1445
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…d CSV output The previous fix only passed sep and decimal to pd.read_csv() (parsing the input CSV), but to_csv() was still called with pandas defaults, so email report attachments were still serialized with ',' and '.' regardless of config. Pass the same sep and decimal values to to_csv() so the round-trip is complete: configured separator and decimal are preserved in the output CSV sent as email attachment. Also strengthen the test to assert that the output CSV uses the configured separator and decimal, not just that column names were parsed correctly. Addresses review feedback on apache#38616.
Code Review Agent Run #f1c406Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
User description
SUMMARY
When
CSV_EXPORTconfig uses a non-default delimiter (e.g.sep=";") and decimal separator (e.g.decimal=","), GUI CSV exports work correctly because they go throughquery_context_processor.pywhich passes**current_app.config["CSV_EXPORT"]todf_to_escaped_csv(). However, CSV email reports take a different path throughapply_client_processing()inclient_processing.py, which calledpd.read_csv(StringIO(data))without readingsep/decimalfrom the config.This caused pandas to misparse the semicolon-delimited CSV (treating the whole row as one column), which then broke the pivot/table post-processor and returned an HTTP 500.
Fix: Read
sepanddecimalfromcurrent_app.config["CSV_EXPORT"]and pass them topd.read_csv(), defaulting to","and"."respectively when not configured.Fixes #36682
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A — backend-only fix.
TESTING INSTRUCTIONS
CSV_EXPORT = {"sep": ";", "decimal": ","}insuperset_config.pyUnit test added:
ADDITIONAL INFORMATION
CodeAnt-AI Description
Respect CSV_EXPORT delimiter and decimal settings for scheduled CSV email reports
What Changed
Impact
✅ Fewer scheduled CSV email failures (no HTTP 500 from misparsed CSV)✅ Correct column parsing in CSV email attachments✅ Correct numeric parsing when decimal separator is configured💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.