Skip to content

Add Kedro comparison section with working examples#1282

Open
yarikoptic wants to merge 2 commits intodatalad-handbook:mainfrom
yarikoptic:enh-kedro
Open

Add Kedro comparison section with working examples#1282
yarikoptic wants to merge 2 commits intodatalad-handbook:mainfrom
yarikoptic:enh-kedro

Conversation

@yarikoptic
Copy link
Copy Markdown
Contributor

@yarikoptic yarikoptic commented Feb 7, 2026

Rendered version shortcut: https://datalad-handbook--1282.org.readthedocs.build/en/1282/beyond_basics/101-185-kedro.html

Add comprehensive comparison between DataLad/YODA and Kedro, a popular Python framework for data engineering pipelines. The section covers:

  • Philosophy and focus differences
  • Project setup comparison
  • Data versioning approaches (Kedro timestamp-based vs git-annex)
  • Modularity patterns (modular pipelines vs subdatasets)
  • Pipeline execution and provenance tracking
  • Configuration management
  • When to use which tool
  • How to use them together (walkthrough example)

Key improvements applied to make examples work with current versions:

  • Update kedro_init_version from 0.19.0 to 1.2.0 (required for compatibility)
  • Add required catalog.yml file for Kedro 1.x
  • Enhance demo pipeline to write output.txt for visible provenance tracking
  • Add .gitignore for Python cache files (__pycache__/, *.pyc)
  • Include KEDRO_DISABLE_TELEMETRY option for cleaner output
  • Add test script (kedro-examples-test.sh) validating all examples

All examples tested and working with datalad 1.3.1 and kedro 1.2.0.

primarily was born from me keep running into kedro and thus wanted to see a comparison similar to the one we have to DVC. Could potentially be cut (e.g. trailing section) or abandoned altogether

Add comprehensive comparison between DataLad/YODA and Kedro, a popular
Python framework for data engineering pipelines. The section covers:

- Philosophy and focus differences
- Project setup comparison
- Data versioning approaches (Kedro timestamp-based vs git-annex)
- Modularity patterns (modular pipelines vs subdatasets)
- Pipeline execution and provenance tracking
- Configuration management
- When to use which tool
- How to use them together (walkthrough example)

Key improvements applied to make examples work with current versions:
- Update kedro_init_version from 0.19.0 to 1.2.0 (required for compatibility)
- Add required catalog.yml file for Kedro 1.x
- Enhance demo pipeline to write output.txt for visible provenance tracking
- Add .gitignore for Python cache files (__pycache__/, *.pyc)
- Include KEDRO_DISABLE_TELEMETRY option for cleaner output
- Add test script (kedro-examples-test.sh) validating all examples

All examples tested and working with datalad 1.3.1 and kedro 1.2.0.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
.. code-block:: console

### Optional: Disable telemetry for cleaner output
$ export KEDRO_DISABLE_TELEMETRY=true
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- Replace manual file creation with `kedro new` in the integration
  walkthrough, per Kedro team recommendation
- Add admonition clarifying DataLad handles versioning (don't use
  Kedro's `versioned: true` alongside it)
- Reorder steps: kedro new -> datalad create --force -> pipeline -> run
- Drop standalone Kedro test (TEST 4) that only tested manual setup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant