Skip to content

Add nbstripout pre-commit hook to reduce noisy notebook diffs#3585

Open
Prathamesh8989 wants to merge 5 commits intopytorch:masterfrom
Prathamesh8989:fix-precommit-config
Open

Add nbstripout pre-commit hook to reduce noisy notebook diffs#3585
Prathamesh8989 wants to merge 5 commits intopytorch:masterfrom
Prathamesh8989:fix-precommit-config

Conversation

@Prathamesh8989
Copy link

Fixes #3578

Description:

This PR adds the nbstripout pre-commit hook to automatically strip Jupyter notebook outputs and unnecessary metadata before committing.

This helps reduce noisy diffs caused by execution counts, cell outputs, and metadata changes during notebook refactoring.

The hook is configured in .pre-commit-config.yaml using:

- repo: https://github.com/kynan/nbstripout
  rev: 0.7.1
  hooks:
    - id: nbstripout
      types: [jupyter]

This ensures cleaner version control history and improves reviewability of notebook changes.

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@omkar-334
Copy link
Contributor

hi @Prathamesh8989 , thanks for this. a few suggestions -

  1. can you revert the .gitignore changes?
  2. can you compare nbstripout and np-clean hooks, just for reference? try running the pre-commit after you change it, check how the notebooks are getting affected. you can put a screenshot of the diff here.

@Prathamesh8989
Copy link
Author

Hi @omkar-334,

Thanks for the suggestions! Here’s what I’ve done:

  1. Reverted .gitignore changes
    As requested, the .gitignore modifications have been reverted.

  2. Comparison of nbstripout and nb-clean hooks
    I temporarily replaced nbstripout with nb-clean in the pre-commit configuration and ran pre-commit --all-files on all notebooks.
    nb-clean removes additional notebook metadata compared to nbstripout, as shown in the attached diff screenshot.

    ignite1

    ignite2

Please let me know which approach you prefer or if any further changes are needed.

@@ -1,4 +1,5 @@
exclude: "^conda.recipe|assets|docs"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this extra line

@omkar-334
Copy link
Contributor

@vfdev-5 i think we can start with this. we can add nbqa (lint/formatting for notebooks) when we add ruff.

Comment on lines 33 to 38

- repo: https://github.com/kynan/nbstripout
rev: 0.7.1
hooks:
- id: nbstripout
types: [jupyter]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- repo: https://github.com/kynan/nbstripout
rev: 0.7.1
hooks:
- id: nbstripout
types: [jupyter]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove it from last hook

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
types_or: [python, pyi, jupyter]

hooks:
- id: prettier
exclude_types: ["python", "jupyter", "shell", "gitignore"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- repo: https://github.com/kynan/nbstripout
rev: 0.7.1
hooks:
- id: nbstripout
types: [jupyter]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order matters, the notebooks should be cleaned before formatting.

@@ -29,3 +30,9 @@
rev: 24.10.0
hooks:
- id: black
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- id: black
- id: black
types_or: [python, pyi, jupyter]

@aaishwarymishra
Copy link
Contributor

@vfdev-5 we could use ruff-format instead of using black separately whats your thoughts?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Feb 26, 2026

@vfdev-5 we could use ruff-format instead of using black separately whats your thoughts?

We can try that as well. The last time it was the question between black and ruff differences in formatting rules. It may happen that ruff would require reformat the whole codebase. Maybe this can be ok, let's see how many files would be changed by ruff format

@Prathamesh8989
Copy link
Author

Hi all, thanks for the detailed feedback and suggestions .

Here’s a summary of the latest updates:

  • Reverted the .gitignore changes as requested.
  • Compared nbstripout and nb-clean locally by running pre-commit --all-files and reviewing the notebook diffs.
  • Kept nbstripout as discussed.
  • Ensured hook ordering so that notebooks are cleaned before formatting.
  • Updated ruff-check and black to include jupyter in types_or.
  • Removed the extra blank line in .pre-commit-config.yaml.
  • Rebased on the latest branch and resolved all feedback.
  • Ran pre-commit locally to ensure hooks pass cleanly.

Current configuration

  • Runs nbstripout first for notebook cleaning.
  • Runs ruff-check on python, pyi, and jupyter.
  • Runs black on python, pyi, and jupyter.
  • Keeps prettier exclusions intact.

Workflows are now pending maintainer approval. Please let me know if any further adjustments are needed.

Thanks again for the guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add pre-commit hook for Jupyter notebooks to reduce noisy diffs

4 participants