fix(bigquery): limit result set size to prevent browser memory crashes#38588
fix(bigquery): limit result set size to prevent browser memory crashes#38588
Conversation
Implement memory-aware progressive fetching in BigQuery's fetch_data method. Large result sets (950+ MB) previously crashed Chrome by loading everything into memory at once. The fix samples an initial batch to estimate row size, then fetches only as many rows as fit within the BQ_FETCH_MAX_MB config limit (default 200 MB). A warning toast is shown to users when results are truncated. This is always-on with no feature flag -- operators control the budget via the BQ_FETCH_MAX_MB config constant. Originally by @ethan-l-geotab in #36387. Co-authored-by: ethan-l-geotab <ethanliong@geotab.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review Agent Run #e5147f
Actionable Suggestions - 3
-
superset/db_engine_specs/bigquery.py - 2
- Inaccurate memory estimation · Line 335-335
- Avoid blind exception catch · Line 367-367
-
superset-frontend/packages/superset-ui-core/src/query/types/QueryResponse.ts - 1
- Backend schema missing warning field · Line 80-80
Review Details
-
Files reviewed - 7 · Commit Range:
1773531..1773531- superset-frontend/packages/superset-ui-core/src/query/types/QueryResponse.ts
- superset-frontend/src/components/Chart/chartAction.ts
- superset/common/query_context_processor.py
- superset/config.py
- superset/db_engine_specs/bigquery.py
- tests/unit_tests/common/test_query_context_processor.py
- tests/unit_tests/db_engine_specs/test_bigquery.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- Eslint (Linter) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| first_batch = [r.values() for r in first_batch] | ||
|
|
||
| # Estimate how many rows fit in the memory budget | ||
| first_batch_bytes = sys.getsizeof(str(first_batch)) |
There was a problem hiding this comment.
The memory size estimation uses sys.getsizeof(str(first_batch)), but str() creates a string representation that can be significantly larger than the actual in-memory size of the list object. This leads to overestimating memory usage and potentially fetching fewer rows than the budget allows. Use sys.getsizeof(first_batch) for correct memory budgeting.
Code suggestion
Check the AI-generated fix before applying
| first_batch_bytes = sys.getsizeof(str(first_batch)) | |
| first_batch_bytes = sys.getsizeof(first_batch) |
Code Review Run #e5147f
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
| g.bq_memory_limited_row_count = len(data) | ||
| return data | ||
|
|
||
| except Exception: # pylint: disable=broad-except |
There was a problem hiding this comment.
Replace the broad except Exception: at line 367 with specific exception types that are expected from cursor operations. This improves error handling clarity and prevents masking unexpected errors.
Code suggestion
Check the AI-generated fix before applying
| except Exception: # pylint: disable=broad-except | |
| except (DatabaseError, OperationalError, Exception): # pylint: disable=broad-except |
Code Review Run #e5147f
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
| // TODO(hainenber): define proper type for below attributes | ||
| rejected_filters?: any[]; | ||
| applied_filters?: any[]; | ||
| warning?: string | null; |
There was a problem hiding this comment.
The added warning field matches backend response data, but the schema lacks it, potentially causing validation issues. Add warning = fields.String(allow_none=True) to ChartDataResponseResult.
Code Review Run #e5147f
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
SUMMARY
Adopts and improves the fix from #36387 (originally by @ethan-l-geotab). Fixes #36385.
BigQuery queries returning huge result sets (950+ MB) crash Chrome by loading everything into browser memory at once. This PR implements memory-aware progressive fetching in
BigQueryEngineSpec.fetch_data:BQ_FETCH_MAX_MBconfig limit (default 200 MB)gto the query context processor, which adds it to the response payloadBaseEngineSpec.fetch_dataimplementationKey differences from the original PR (#36387):
BQ_MEMORY_LIMIT_FETCHfeature flag -- the fix is always-onBQ_FETCH_MAX_MBconfig constant (default 200 MB) is the only operator-level knobchartAction.ts(was.jsin the original PR)Files changed:
superset/db_engine_specs/bigquery.py-- Memory-aware progressive fetch implementationsuperset/common/query_context_processor.py-- Warning propagation via Flaskgsuperset/config.py--BQ_FETCH_MAX_MB = 200config constantsuperset-frontend/src/components/Chart/chartAction.ts-- Warning toast displaysuperset-frontend/packages/superset-ui-core/src/query/types/QueryResponse.ts--warningfield onChartDataResponseResulttests/unit_tests/db_engine_specs/test_bigquery.py-- 5 new test casestests/unit_tests/common/test_query_context_processor.py-- 2 new test casesBEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Before: Chrome crashes or becomes unresponsive when BigQuery returns 950+ MB of data.
After: Results are truncated to fit within the configured memory budget, and a warning toast informs the user.
TESTING INSTRUCTIONS
BQ_FETCH_MAX_MBto a smaller value (e.g., 10) insuperset_config.pyto test truncation with smaller datasetspytest tests/unit_tests/db_engine_specs/test_bigquery.py -k test_fetch_data -vADDITIONAL INFORMATION
Co-authored-by: ethan-l-geotab ethanliong@geotab.com