[KEP 6] Structured Schemas in Kedro: Core Dependency vs Optional Backend #5420
Replies: 7 comments 2 replies
-
|
@deepyaman , @noklam , @datajoely - Please do let us know what you think on the above approach. |
Beta Was this translation helpful? Give feedback.
-
|
I personally like Pydantic a lot. Not only it's powerful and well-maintained, but it's also very popular, so you'll find many users/contributors to be familiar with it. Bias aside, I think the tradeoffs of having it as a dependency are worth it. |
Beta Was this translation helpful? Give feedback.
-
|
What extra benefit do we get if we enforce Pydantic at core?
This sounds perfectly fine for me, they are two different concepts anyway
As noted it's fairly trivial to convert dataclass to Pydantic when needed, so this doesn't seem much effort.
I am not sure about this, I am still unclear how these schema will be used/consume. Is this gonna be used extensively across core, i.e. Dataset will become a subclass of |
Beta Was this translation helpful? Give feedback.
-
|
-1 Pydantic is a great, widely-adopted library, but it's not literally everywhere. Notably, no major dataframe library strictly depends on Pydantic, so I don't agree that everybody is already using Pydantic. Kedro-Viz, while useful, is ultimately an optional plugin (and it doesn't get bundled in all Kedro deployment situations). One of Kedro's greatest strengths is that it can be used with pretty much any Python framework, since it's so generally unopinionated in that regard. There's still a lot of code out there that uses attrs, cattrs, whatever. Last but not least, I don’t see anything fundamentally requiring Pydantic at the core in Kedro right now--mostly plugin and plugin-like functionality--so forcing Pydantic is unnecessary. (Also, Pydantic v1/v2 compatibility was a big thing; I think should look at what the impact to users could have looked like with that if we had historically depended on Pydantic, and what that might mean in terms of impact of a future Pydantic upgrade.) Edit: I mean -1 Option A, +1 Option B—I'm good with Pydantic as an optional dependency. |
Beta Was this translation helpful? Give feedback.
-
|
+1 for Option B — Pydantic as an Optional Backend with stdlib contracts in core Rationale:
Thank you |
Beta Was this translation helpful? Give feedback.
-
|
+1 for Option B — Pydantic as an Optional Backend. A few key arguments: 1. The codebase already implements Option B. The existing ModelFactory and 2. Release cadence and maintenance burden. Making Pydantic a core dependency ties Kedro's release cycle to Pydantic's. Every Pydantic release — especially breaking majors like v1→v2 — becomes a Kedro core problem. It also increases the risk of dependency conflicts with user projects that may pin a different Pydantic version. 3. Kedro's deployment surface is wide. Kedro runs on Databricks, SageMaker, Vertex AI, and in corporate environments with locked dependency trees. A hard Pydantic requirement is a constraint pushed onto all users, even those who never use validation or server features. 4. Reversibility. Promoting Pydantic from optional to core is a minor version change — users who already use it see no difference. Demoting it from core back to optional is a breaking change. This asymmetry strongly favors starting conservative (Option B) and escalating only if clear evidence emerges. 5. Architectural consistency. Kedro is already a layered system (core, plugins, datasets, viz, server). Having the schema boundary follow the same layering feels natural - core uses dataclasses; |
Beta Was this translation helpful? Give feedback.
-
|
Hey all, @rashidakanchwala directed me to this ticket since I've worked a lot with pydantic on vizro (it's one of our two core dependencies) and have some (increasingly rusty) familiarity with kedro... I'm too far out of the loop and don't have enough understanding of everything that's already going on here to make an educated vote, but just want to make some general points and questions:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
[KEP 6] Structured Schemas in Kedro: Core Dependency vs Optional Backend
Summary
This KEP proposes a unified architectural direction for structured schemas in Kedro.
As structured validation and serialization expand across:
Kedro must decide whether:
This KEP presents both architectural options, evaluates trade-offs, and proposes a direction for decision.
Context
Pydantic is increasingly being introduced in multiple areas of Kedro:
kedro[server]The current
ModelFactoryalready supports:As usage expands, Kedro risks implicitly coupling core modules to Pydantic without an explicit architectural decision.
Rather than framing this as:
This KEP reframes the discussion as:
Problem Statement
Kedro needs a consistent approach to:
The key architectural question is:
This decision impacts:
Architectural Options
Option A — Pydantic as a Core Dependency
Description
Kedro core directly depends on Pydantic.
Structured schemas across parameters, inspection, and server layers are represented using
pydantic.BaseModelas the canonical abstraction.Pydantic becomes a required dependency of
kedro.What This Means
kedroinstalls Pydantic by defaultBaseModelAdvantages
1. Single Schema System
2. Built-In Validation & Serialization
3. Ecosystem Alignment
4. Reduced Boilerplate
Risks / Costs
1. Hard Dependency
2. Architectural Coupling
3. Version Sensitivity
4. Shift in Design Philosophy
Option B — Pydantic as an Optional Backend (Decoupled Core)
Description
Kedro core defines structured schema contracts using stdlib dataclasses.
Pydantic is supported via
kedro[pydantic]as an optional backend and viakedro[server]at the HTTP boundary.Core depends only on schema contracts, not on Pydantic APIs.
Proposed Layering
kedro[pydantic]kedro[server]Example: Core Contract
Core serialization:
API Layer (Server)
An adapter layer converts between core contracts and API transport models.
Note: Pydantic v2 supports validating standard dataclasses (e.g., via
TypeAdapter), which may reduce duplication while preserving separation.Advantages
1. Loose Coupling
2. Optional Richness
3. Clear Separation of Concerns
4. Lower Baseline Dependency Footprint
Risks / Costs
1. Duplication
2. Adapter Complexity
3. Mixed Mental Model
Comparison
Dependency Considerations
This decision materially affects Kedro’s dependency model.
Key considerations:
If Pydantic is already deeply embedded across core features, Option A may simplify the architecture.
If maintaining a minimal, library-agnostic core is a guiding principle, Option B preserves that boundary.
Trade-Off
Option A optimizes for simplicity and uniformity.
Option B optimizes for decoupling and flexibility.
Neither approach is strictly superior; the choice depends on Kedro’s long-term architectural philosophy.
Proposed Decision
Approve a formal architectural stance on structured schemas in Kedro by selecting either:
I am leaning toward Option B (Optional Backend) because it:
However, consensus is required on whether Kedro prefers:
Beta Was this translation helpful? Give feedback.
All reactions