Skip to content

fix(bigquery): limit result set size to prevent browser memory crashes#38588

Open
rusackas wants to merge 1 commit intomasterfrom
adopt-pr-36387-bq-memory-limit
Open

fix(bigquery): limit result set size to prevent browser memory crashes#38588
rusackas wants to merge 1 commit intomasterfrom
adopt-pr-36387-bq-memory-limit

Conversation

@rusackas
Copy link
Member

SUMMARY

Adopts and improves the fix from #36387 (originally by @ethan-l-geotab). Fixes #36385.

BigQuery queries returning huge result sets (950+ MB) crash Chrome by loading everything into browser memory at once. This PR implements memory-aware progressive fetching in BigQueryEngineSpec.fetch_data:

  1. Progressive fetch: Samples an initial batch (1000 rows) to estimate row size, then extrapolates how many total rows fit within the BQ_FETCH_MAX_MB config limit (default 200 MB)
  2. Warning propagation: When results are truncated, a warning is passed through Flask g to the query context processor, which adds it to the response payload
  3. Frontend toast: The chart action handler displays a warning toast to the user when results were truncated
  4. Graceful fallback: On any error in the progressive fetch, falls back to the parent BaseEngineSpec.fetch_data implementation

Key differences from the original PR (#36387):

  • Removed the BQ_MEMORY_LIMIT_FETCH feature flag -- the fix is always-on
  • The BQ_FETCH_MAX_MB config constant (default 200 MB) is the only operator-level knob
  • Added comprehensive unit tests for the new BigQuery fetch logic and warning propagation
  • Applied to the renamed chartAction.ts (was .js in the original PR)

Files changed:

  • superset/db_engine_specs/bigquery.py -- Memory-aware progressive fetch implementation
  • superset/common/query_context_processor.py -- Warning propagation via Flask g
  • superset/config.py -- BQ_FETCH_MAX_MB = 200 config constant
  • superset-frontend/src/components/Chart/chartAction.ts -- Warning toast display
  • superset-frontend/packages/superset-ui-core/src/query/types/QueryResponse.ts -- warning field on ChartDataResponseResult
  • tests/unit_tests/db_engine_specs/test_bigquery.py -- 5 new test cases
  • tests/unit_tests/common/test_query_context_processor.py -- 2 new test cases

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before: Chrome crashes or becomes unresponsive when BigQuery returns 950+ MB of data.

After: Results are truncated to fit within the configured memory budget, and a warning toast informs the user.

TESTING INSTRUCTIONS

  1. Configure a BigQuery datasource in Superset
  2. Run a query that returns a very large result set (millions of rows)
  3. Verify that results are truncated and a warning toast appears
  4. Set BQ_FETCH_MAX_MB to a smaller value (e.g., 10) in superset_config.py to test truncation with smaller datasets
  5. Verify that queries returning small result sets work normally without any warning
  6. Run the new unit tests: pytest tests/unit_tests/db_engine_specs/test_bigquery.py -k test_fetch_data -v

ADDITIONAL INFORMATION

Co-authored-by: ethan-l-geotab ethanliong@geotab.com

Implement memory-aware progressive fetching in BigQuery's fetch_data
method. Large result sets (950+ MB) previously crashed Chrome by loading
everything into memory at once. The fix samples an initial batch to
estimate row size, then fetches only as many rows as fit within the
BQ_FETCH_MAX_MB config limit (default 200 MB). A warning toast is shown
to users when results are truncated.

This is always-on with no feature flag -- operators control the budget
via the BQ_FETCH_MAX_MB config constant.

Originally by @ethan-l-geotab in #36387.

Co-authored-by: ethan-l-geotab <ethanliong@geotab.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dosubot dosubot bot added the data:connect:googlebigquery Related to BigQuery label Mar 11, 2026
Copy link
Contributor

@bito-code-review bito-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #e5147f

Actionable Suggestions - 3
  • superset/db_engine_specs/bigquery.py - 2
  • superset-frontend/packages/superset-ui-core/src/query/types/QueryResponse.ts - 1
    • Backend schema missing warning field · Line 80-80
Review Details
  • Files reviewed - 7 · Commit Range: 1773531..1773531
    • superset-frontend/packages/superset-ui-core/src/query/types/QueryResponse.ts
    • superset-frontend/src/components/Chart/chartAction.ts
    • superset/common/query_context_processor.py
    • superset/config.py
    • superset/db_engine_specs/bigquery.py
    • tests/unit_tests/common/test_query_context_processor.py
    • tests/unit_tests/db_engine_specs/test_bigquery.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • Eslint (Linter) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

first_batch = [r.values() for r in first_batch]

# Estimate how many rows fit in the memory budget
first_batch_bytes = sys.getsizeof(str(first_batch))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inaccurate memory estimation

The memory size estimation uses sys.getsizeof(str(first_batch)), but str() creates a string representation that can be significantly larger than the actual in-memory size of the list object. This leads to overestimating memory usage and potentially fetching fewer rows than the budget allows. Use sys.getsizeof(first_batch) for correct memory budgeting.

Code suggestion
Check the AI-generated fix before applying
Suggested change
first_batch_bytes = sys.getsizeof(str(first_batch))
first_batch_bytes = sys.getsizeof(first_batch)

Code Review Run #e5147f


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

g.bq_memory_limited_row_count = len(data)
return data

except Exception: # pylint: disable=broad-except
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid blind exception catch

Replace the broad except Exception: at line 367 with specific exception types that are expected from cursor operations. This improves error handling clarity and prevents masking unexpected errors.

Code suggestion
Check the AI-generated fix before applying
Suggested change
except Exception: # pylint: disable=broad-except
except (DatabaseError, OperationalError, Exception): # pylint: disable=broad-except

Code Review Run #e5147f


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

// TODO(hainenber): define proper type for below attributes
rejected_filters?: any[];
applied_filters?: any[];
warning?: string | null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend schema missing warning field

The added warning field matches backend response data, but the schema lacks it, potentially causing validation issues. Add warning = fields.String(allow_none=True) to ChartDataResponseResult.

Code Review Run #e5147f


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large chart data doesn't return a helpful error message

2 participants