-
Notifications
You must be signed in to change notification settings - Fork 9
ci: Speedup Notebook execution #551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Speedup Notebook execution #551
Conversation
…ation, reducing code duplication while maintaining the same functionality (data preparation, color resolution from components, PlotResult wrapping).
… area(), and duration_curve() methods in both DatasetPlotAccessor and DataArrayPlotAccessor
2. scatter() method - Plots two variables against each other with x and y parameters
3. pie() method - Creates pie charts from aggregated (scalar) dataset values, e.g. ds.sum('time').fxplot.pie()
4. duration_curve() method - Sorts values along the time dimension in descending order, with optional normalize parameter for percentage x-axis
5. CONFIG.Plotting.default_line_shape - New config option (default 'hv') that controls the default line shape for line(), area(), and duration_curve() methods
1. X-axis is now determined first using CONFIG.Plotting.x_dim_priority
2. Facets are resolved from remaining dimensions (x-axis excluded)
x_dim_priority expanded:
x_dim_priority = ('time', 'duration', 'duration_pct', 'period', 'scenario', 'cluster')
- Time-like dims first, then common grouping dims as fallback
- variable stays excluded (it's used for color, not x-axis)
_get_x_dim() refactored:
- Now takes dims: list[str] instead of a DataFrame
- More versatile - works with any list of dimension names
- Add `x` parameter to bar/stacked_bar/line/area for explicit x-axis control - Add CONFIG.Plotting.x_dim_priority for auto x-axis selection order - X-axis determined first, facets from remaining dimensions - Refactor _get_x_column -> _get_x_dim (takes dim list, not DataFrame) - Support scalar data (no dims) by using 'variable' as x-axis
- Add `x` parameter to bar/stacked_bar/line/area for explicit x-axis control
- Add CONFIG.Plotting.x_dim_priority for auto x-axis selection
Default: ('time', 'duration', 'duration_pct', 'period', 'scenario', 'cluster')
- X-axis determined first, facets resolved from remaining dimensions
- Refactor _get_x_column -> _get_x_dim (takes dim list, more versatile)
- Support scalar data (no dims) by using 'variable' as x-axis
- Skip color='variable' when x='variable' to avoid double encoding
- Fix _dataset_to_long_df to use dims (not just coords) as id_vars
- Add `x` parameter to bar/stacked_bar/line/area for explicit x-axis control
- Add CONFIG.Plotting.x_dim_priority for auto x-axis selection
Default: ('time', 'duration', 'duration_pct', 'period', 'scenario', 'cluster')
- X-axis determined first, facets resolved from remaining dimensions
- Refactor _get_x_column -> _get_x_dim (takes dim list, more versatile)
- Support scalar data (no dims) by using 'variable' as x-axis
- Skip color='variable' when x='variable' to avoid double encoding
- Fix _dataset_to_long_df to use dims (not just coords) as id_vars
- Ensure px_kwargs properly overrides all defaults (color, facets, etc.)
…wargs} so user can override 2. scatter unused colors - Removed the unused parameter 3. to_duration_curve sorting - Changed [::-1] to np.flip(..., axis=time_axis) for correct multi-dimensional handling 4. DataArrayPlotAccessor.heatmap - Same kwarg merge fix
…ork-v2+plotting # Conflicts: # docs/notebooks/08a-aggregation.ipynb # docs/notebooks/08b-rolling-horizon.ipynb # docs/notebooks/08c-clustering.ipynb # docs/notebooks/08c2-clustering-storage-modes.ipynb # docs/notebooks/08d-clustering-multiperiod.ipynb # docs/notebooks/08e-clustering-internals.ipynb
.github/workflows/docs.yaml 1. Notebook caching - Caches executed notebooks using a hash of notebooks + source code 2. Parallel execution - Runs jupyter execute with -P 4 (4 notebooks in parallel) 3. Skip mkdocs-jupyter execution - Sets MKDOCS_JUPYTER_EXECUTE=false since notebooks are pre-executed
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThe documentation build workflow now implements caching for executed notebooks and parallel execution when cache misses occur. The notebook execution flag in the mkdocs configuration is delegated to an environment variable, allowing the CI pipeline to control execution timing rather than always executing. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
… it, causing the NetCDF write failure in CI
…egate-rework-v2+plotting+speedup-ci
…ork-v2+plotting+speedup-ci
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/workflows/docs.yaml (2)
74-79: Consider explicitly aligning notebook execution with mkdocs-jupyter configuration.The parallel execution finds all
.ipynbfiles recursively, which may include notebooks that shouldn't be processed. While the current setup appears safe based on the nav structure, explicitly aligning with mkdocs-jupyter's ignore patterns would make the behavior more maintainable and prevent accidental execution of draft or test notebooks.🔎 Optional improvement to align with ignore patterns
- name: Execute notebooks in parallel if: steps.notebook-cache.outputs.cache-hit != 'true' run: | # Execute all notebooks in parallel (4 at a time) - find docs/notebooks -name '*.ipynb' -print0 | \ + find docs/notebooks -name '*.ipynb' ! -path '*/data/*' -print0 | \ xargs -0 -P 4 -I {} jupyter execute --inplace {}This explicitly excludes the
data/subdirectory, matching the mkdocs-jupyter ignore pattern.
121-148: Notebook preparation is correctly duplicated for job isolation.The deploy job independently implements the same caching and parallel execution logic as the build job. While this creates some duplication, it ensures the deploy job can run independently and maintains workflow clarity.
If you anticipate further changes to the notebook execution logic, consider extracting it into a composite action to reduce maintenance overhead.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/docs.yamlmkdocs.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
- GitHub Check: test (3.14)
- GitHub Check: Build documentation
🔇 Additional comments (3)
mkdocs.yml (1)
233-233: Good pattern for CI-controlled execution.The environment variable delegation with a sensible default (
true) preserves local development behavior while allowing CI to optimize by pre-executing notebooks in parallel and then skipping re-execution during the build step..github/workflows/docs.yaml (2)
67-72: Caching strategy is well-designed.The cache path correctly targets the notebooks directory, and the key appropriately includes both notebook content and source code changes, ensuring cache invalidation when either changes.
81-84: Correct integration with pre-execution strategy.Setting
MKDOCS_JUPYTER_EXECUTE="false"properly prevents mkdocs from re-executing the notebooks that were already executed in parallel, optimizing the build time.
| - name: Get notebook cache key | ||
| id: notebook-cache-key | ||
| run: | | ||
| # Hash notebooks + flixopt source code | ||
| HASH=$(find docs/notebooks -name '*.ipynb' -exec cat {} \; | cat - <(find flixopt -name '*.py' -exec cat {} \;) | sha256sum | cut -d' ' -f1) | ||
| echo "hash=$HASH" >> $GITHUB_OUTPUT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort files before hashing to ensure stable cache keys.
The cache key computation concatenates file contents without sorting, which means filesystem iteration order affects the hash. This can lead to different cache keys for identical content across runs, reducing cache effectiveness.
🔎 Proposed fix to stabilize cache keys
- name: Get notebook cache key
id: notebook-cache-key
run: |
# Hash notebooks + flixopt source code
- HASH=$(find docs/notebooks -name '*.ipynb' -exec cat {} \; | cat - <(find flixopt -name '*.py' -exec cat {} \;) | sha256sum | cut -d' ' -f1)
+ HASH=$(find docs/notebooks -name '*.ipynb' | sort | xargs cat | cat - <(find flixopt -name '*.py' | sort | xargs cat) | sha256sum | cut -d' ' -f1)
echo "hash=$HASH" >> $GITHUB_OUTPUT📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Get notebook cache key | |
| id: notebook-cache-key | |
| run: | | |
| # Hash notebooks + flixopt source code | |
| HASH=$(find docs/notebooks -name '*.ipynb' -exec cat {} \; | cat - <(find flixopt -name '*.py' -exec cat {} \;) | sha256sum | cut -d' ' -f1) | |
| echo "hash=$HASH" >> $GITHUB_OUTPUT | |
| - name: Get notebook cache key | |
| id: notebook-cache-key | |
| run: | | |
| # Hash notebooks + flixopt source code | |
| HASH=$(find docs/notebooks -name '*.ipynb' | sort | xargs cat | cat - <(find flixopt -name '*.py' | sort | xargs cat) | sha256sum | cut -d' ' -f1) | |
| echo "hash=$HASH" >> $GITHUB_OUTPUT |
🤖 Prompt for AI Agents
.github/workflows/docs.yaml around lines 60 to 65: the hash is computed by
concatenating notebook and flixopt file contents in filesystem order, which is
non-deterministic; change the pipeline to list files first, sort the filenames
deterministically (e.g., using find to print paths and sort), then cat the files
in that sorted order (use a null-safe approach if filenames may contain special
characters) so the same set of files always yields the same hash.
… keys across runs
Improve CI workflow for faster notebook execution in documentation builds. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Improve CI workflow for faster notebook execution in documentation builds. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Description
Brief description of the changes in this PR.
Type of Change
Related Issues
Closes #(issue number)
Testing
Checklist
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.