[KEP 6] Structured Schemas in Kedro: Core Dependency vs Optional Backend #5420

rashidakanchwala · 2026-03-04T10:36:41Z

rashidakanchwala
Mar 4, 2026
Maintainer

[KEP 6] Structured Schemas in Kedro: Core Dependency vs Optional Backend

Summary

This KEP proposes a unified architectural direction for structured schemas in Kedro.

As structured validation and serialization expand across:

Kedro must decide whether:

Pydantic becomes a required core dependency, or
Pydantic remains an optional backend, with Kedro core depending only on stdlib schema contracts.

This KEP presents both architectural options, evaluates trade-offs, and proposes a direction for decision.

Context

Pydantic is increasingly being introduced in multiple areas of Kedro:

Parameter validation
Inspection snapshot models
REST API models in kedro[server]

The current ModelFactory already supports:

Pydantic models
Dataclasses
Raw values (no type hints)

As usage expands, Kedro risks implicitly coupling core modules to Pydantic without an explicit architectural decision.

Rather than framing this as:

“Should Kedro use Pydantic?”

This KEP reframes the discussion as:

What is Kedro’s architectural stance on structured schemas?

Problem Statement

Kedro needs a consistent approach to:

Structured parameter validation
Structured inspection snapshots
REST API serialization

The key architectural question is:

Should Kedro core directly depend on Pydantic, or should Pydantic be an optional backend implementing schema contracts?

This decision impacts:

Dependency footprint
Architectural coupling
Contributor experience
Long-term flexibility

Architectural Options

Option A — Pydantic as a Core Dependency

Description

Kedro core directly depends on Pydantic.

Structured schemas across parameters, inspection, and server layers are represented using pydantic.BaseModel as the canonical abstraction.

Pydantic becomes a required dependency of kedro.

What This Means

kedro installs Pydantic by default
Core domain models may be implemented as BaseModel
Validation and serialization behavior are standardized
Server models naturally align with core models

Advantages

1. Single Schema System

One abstraction across core and server
No duplication between dataclasses and API models
Simplified contributor mental model

2. Built-In Validation & Serialization

Rich validation features
Native JSON schema generation
Strong typing and error handling

3. Ecosystem Alignment

Pydantic is widely adopted
Aligns naturally with FastAPI
Reduces adapter layers

4. Reduced Boilerplate

No need to maintain separate core and transport models

Risks / Costs

1. Hard Dependency

Increases Kedro’s baseline dependency footprint
All users inherit Pydantic whether they need it or not

2. Architectural Coupling

Core becomes tied to a specific third-party validation framework
Replacing or abstracting later becomes difficult

3. Version Sensitivity

Major changes in Pydantic (e.g., v1 → v2) directly affect core
Potential downstream ecosystem impact

4. Shift in Design Philosophy

Moves Kedro core away from stdlib-first design
Schema abstraction becomes framework-specific

Option B — Pydantic as an Optional Backend (Decoupled Core)

Description

Kedro core defines structured schema contracts using stdlib dataclasses.

Pydantic is supported via kedro[pydantic] as an optional backend and via kedro[server] at the HTTP boundary.

Core depends only on schema contracts, not on Pydantic APIs.

Proposed Layering

Layer	Dependency
Kedro core	stdlib only
`kedro[pydantic]`	Pydantic backend
`kedro[server]`	FastAPI + Pydantic

Example: Core Contract

from dataclasses import dataclass

@dataclass(frozen=True)
class DatasetSnapshot:
    name: str
    type: str
    filepath: str | None = None

Core serialization:

from dataclasses import asdict
import json

json.dumps(asdict(snapshot))

API Layer (Server)

from pydantic import BaseModel

class DatasetSnapshotResponse(BaseModel):
    name: str
    type: str
    filepath: str | None = None

An adapter layer converts between core contracts and API transport models.

Note: Pydantic v2 supports validating standard dataclasses (e.g., via TypeAdapter), which may reduce duplication while preserving separation.

Advantages

1. Loose Coupling

Core remains independent of third-party schema libraries
Preserves architectural flexibility

2. Optional Richness

Users opt into Pydantic when they need advanced validation
Plain Kedro remains lightweight

3. Clear Separation of Concerns

Core: domain contracts
Backend: validation engine
Server: transport layer

4. Lower Baseline Dependency Footprint

Keeps Kedro minimal for non-server users

Risks / Costs

1. Duplication

Dataclass contracts in core
Pydantic models at API boundary

2. Adapter Complexity

Conversion logic required
Potential drift between contract and transport model

3. Mixed Mental Model

Contributors must understand both dataclasses and Pydantic

Comparison

Dimension	Option A: Pydantic Core	Option B: Optional Backend
Baseline dependencies	Higher	Lower
Architectural coupling	Tight	Loose
Duplication	Minimal	Some
Simplicity	Single system	Layered system
Flexibility	Lower	Higher
Server alignment	Native	Adapter required

Dependency Considerations

This decision materially affects Kedro’s dependency model.

Key considerations:

Does Kedro aim to remain stdlib-first at its core?
Is increasing the baseline dependency footprint acceptable?
How sensitive is Kedro to third-party breaking changes?
Should validation richness be mandatory or opt-in?

If Pydantic is already deeply embedded across core features, Option A may simplify the architecture.

If maintaining a minimal, library-agnostic core is a guiding principle, Option B preserves that boundary.

Trade-Off

Option A optimizes for simplicity and uniformity.
Option B optimizes for decoupling and flexibility.

Neither approach is strictly superior; the choice depends on Kedro’s long-term architectural philosophy.

Proposed Decision

Approve a formal architectural stance on structured schemas in Kedro by selecting either:

Option A — Pydantic as a Core Dependency, or
Option B — Pydantic as an Optional Backend with stdlib contracts in core

I am leaning toward Option B (Optional Backend) because it:

Preserves architectural independence
Avoids tight coupling to a single validation framework
Keeps Kedro lightweight for non-server users
Maintains flexibility for future evolution

However, consensus is required on whether Kedro prefers:

A unified, opinionated schema system (Option A), or
A layered, decoupled architecture (Option B).

rashidakanchwala · 2026-03-05T14:52:05Z

rashidakanchwala
Mar 5, 2026
Maintainer Author

@deepyaman , @noklam , @datajoely - Please do let us know what you think on the above approach.

1 reply

datajoelypx Mar 9, 2026

we're already using pydantic in viz, it's in every LLM library, just adopt it

lrcouto · 2026-03-12T15:47:21Z

lrcouto
Mar 12, 2026
Collaborator

I personally like Pydantic a lot. Not only it's powerful and well-maintained, but it's also very popular, so you'll find many users/contributors to be familiar with it. Bias aside, I think the tradeoffs of having it as a dependency are worth it.

0 replies

noklam · 2026-03-12T17:13:37Z

noklam
Mar 12, 2026
Collaborator

What extra benefit do we get if we enforce Pydantic at core?

Risks / Costs

Duplication
Dataclass contracts in core
Pydantic models at API boundary

This sounds perfectly fine for me, they are two different concepts anyway

Adapter Complexity

As noted it's fairly trivial to convert dataclass to Pydantic when needed, so this doesn't seem much effort.

Mixed Mental Model
Contributors must understand both dataclasses and Pydantic

I am not sure about this, I am still unclear how these schema will be used/consume. Is this gonna be used extensively across core, i.e. Dataset will become a subclass of BaseModel?

0 replies

deepyaman · 2026-03-13T13:02:19Z

deepyaman
Mar 13, 2026
Collaborator

-1 Pydantic is a great, widely-adopted library, but it's not literally everywhere. Notably, no major dataframe library strictly depends on Pydantic, so I don't agree that everybody is already using Pydantic. Kedro-Viz, while useful, is ultimately an optional plugin (and it doesn't get bundled in all Kedro deployment situations).

One of Kedro's greatest strengths is that it can be used with pretty much any Python framework, since it's so generally unopinionated in that regard. There's still a lot of code out there that uses attrs, cattrs, whatever. Last but not least, I don’t see anything fundamentally requiring Pydantic at the core in Kedro right now--mostly plugin and plugin-like functionality--so forcing Pydantic is unnecessary.

(Also, Pydantic v1/v2 compatibility was a big thing; I think should look at what the impact to users could have looked like with that if we had historically depended on Pydantic, and what that might mean in terms of impact of a future Pydantic upgrade.)

Edit: I mean -1 Option A, +1 Option B—I'm good with Pydantic as an optional dependency.

0 replies

ravi-kumar-pilla · 2026-03-13T21:14:18Z

ravi-kumar-pilla
Mar 13, 2026
Collaborator

+1 for Option B — Pydantic as an Optional Backend with stdlib contracts in core

Rationale:

Reversibility: Promoting Pydantic from optional to core is a minor version change; demoting it requires a major breaking release.
The server use case: This is an API layer and it should be treated as an additional capability not coupled with core
Version fragility: Pydantic's v1→v2 break caused some pain. As a core dep, every future breaking change becomes Kedro's migration problem and every user's upgrade blocker, including those who never use validation or server features.
Kedro's deployment surface is wide: Core runs on Databricks, SageMaker, Vertex AI, and corporate stacks with pinned deps. A hard pydantic requirement is a constraint pushed onto all users, even those who don't need it.
The adapter cost is bounded: Preventing drift via import-linter rules is far cheaper than managing Pydantic version compat across the entire user base.

Thank you

0 replies

ElenaKhaustova · 2026-03-16T12:25:27Z

ElenaKhaustova
Mar 16, 2026
Collaborator

+1 for Option B — Pydantic as an Optional Backend.

A few key arguments:

1. The codebase already implements Option B. The existing ModelFactory and validation/utils.py use guarded try: import pydantic blocks, supporting both Pydantic models and stdlib dataclasses. The adapter cost is empirically small and there's no compelling reason to regress from this design for now.

2. Release cadence and maintenance burden. Making Pydantic a core dependency ties Kedro's release cycle to Pydantic's. Every Pydantic release — especially breaking majors like v1→v2 — becomes a Kedro core problem. It also increases the risk of dependency conflicts with user projects that may pin a different Pydantic version.

3. Kedro's deployment surface is wide. Kedro runs on Databricks, SageMaker, Vertex AI, and in corporate environments with locked dependency trees. A hard Pydantic requirement is a constraint pushed onto all users, even those who never use validation or server features.

4. Reversibility. Promoting Pydantic from optional to core is a minor version change — users who already use it see no difference. Demoting it from core back to optional is a breaking change. This asymmetry strongly favors starting conservative (Option B) and escalating only if clear evidence emerges.

5. Architectural consistency. Kedro is already a layered system (core, plugins, datasets, viz, server). Having the schema boundary follow the same layering feels natural - core uses dataclasses; kedro[server] uses Pydantic.

0 replies

antonymilne · 2026-03-16T17:58:15Z

antonymilne
Mar 16, 2026

Hey all, @rashidakanchwala directed me to this ticket since I've worked a lot with pydantic on vizro (it's one of our two core dependencies) and have some (increasingly rusty) familiarity with kedro...

I'm too far out of the loop and don't have enough understanding of everything that's already going on here to make an educated vote, but just want to make some general points and questions:

I don't think the pain of migrating between pydantic v1 and v2 is relevant. On Vizro we use pydantic very heavily and rely on some of its intricacies so this was a big deal and quite painful (like it was for FastAPI for example), but for most "normal" consumers of the library it really wasn't. As far as I understand the way that kedro might consume pydantic given here, it wouldn't have been a big deal for kedro users. Besides, I don't think there is likely to ever be such a big breaking release again on pydantic (just like they said there wouldn't be after Python 2 -> 3). v3 is possibly going to come out this year but will explicitly be a much easier upgrade than v2. So I honestly don't think there will be much ongoing maintenance burden caused by pydantic upgrades.
What's actually the problem with "hard pydantic requirement" being pushed onto all users, even if they don't need it? While it's nice to keep things clean and minimal, what practical problems are there with having an extra, possibly unnecessary package in the dependencies? With the prevalence of uv, installation of packages is hugely faster than it used to be so speed of installation is much less compelling as an argument that in the past. Are there likely to be dependency conflicts between pydantic and other dependencies? Not impossible for sure, but really it sounds very unlikely to me given that pydantic is so widely used.
Does kedro actually have an ambition to become stdlib only? That seems to be suggested by @rashidakanchwala's proposed layering, but kedro has lots of dependencies at the moment? So unless there's really a key aim to become stdlib-only or have an absolutely minimal set of dependencies, I don't see any particular incentive to avoid adding another dependency.
What's actually the use case of pydantic here compared to using dataclasses in kedro core? @deepyaman says "I don’t see anything fundamentally requiring Pydantic at the core in Kedro right now". Is it Implement parameter validation framework #5313? Since [KEP 5] Kedro Inspection API #5405 appears to only involve dataclasses now?
Using dataclass/pydantic/cattrs almost interchangeably across a codebase does add some mental load to developers. So I do think there's potentially an advantage in aligning here, regardless of which that library is. e.g. on Vizro pydantic is a core part of the package, but we sometimes use it instead of dataclasses even when a dataclass would do the job just because everyone on the team is now used to writing pydantic models rather than dataclasses.
Since it was mentioned above: in my experience TypeAdapter works very well for doing pydantic validation on non-pydantic types. So I think this could work well if you wanted to define types using dataclasses but still be able to optionally turn on/off "strict runtime pydantic validation".

1 reply

ravi-kumar-pilla Mar 16, 2026
Collaborator

Using dataclass/pydantic/cattrs almost interchangeably across a codebase does add some mental load to developers. So I do think there's potentially an advantage in aligning here, regardless of which that library is. e.g. on Vizro pydantic is a core part of the package, but we sometimes use it instead of dataclasses even when a dataclass would do the job just because everyone on the team is now used to writing pydantic models rather than dataclasses.

I like this argument. This might be more of a developer experience (based on how kedro-viz shifted from dataclass, pydantic combination to pydantic entirely. Still kedro-viz does not use pydantic as extensive as Vizro does) and setting up a standard across the framework. I am not completely aware of Vizro's codebase but the way kedro is shaping (i.e., the layered approach what @ElenaKhaustova mentioned above), I think it is better off to be an optional dependency but should definitely align at some point to consider making pydantic a default in future.

[KEP 6] Structured Schemas in Kedro: Core Dependency vs Optional Backend #5420

Uh oh!

Uh oh!

rashidakanchwala Mar 4, 2026 Maintainer

[KEP 6] Structured Schemas in Kedro: Core Dependency vs Optional Backend

Summary

Context

Problem Statement

Architectural Options

Option A — Pydantic as a Core Dependency

Description

What This Means

Advantages

Risks / Costs

Option B — Pydantic as an Optional Backend (Decoupled Core)

Description

Proposed Layering

Example: Core Contract

API Layer (Server)

Advantages

Risks / Costs

Comparison

Dependency Considerations

Trade-Off

Proposed Decision

Replies: 7 comments · 2 replies

Uh oh!

rashidakanchwala Mar 5, 2026 Maintainer Author

Uh oh!

datajoelypx Mar 9, 2026

Uh oh!

lrcouto Mar 12, 2026 Collaborator

Uh oh!

noklam Mar 12, 2026 Collaborator

Uh oh!

Uh oh!

deepyaman Mar 13, 2026 Collaborator

Uh oh!

ravi-kumar-pilla Mar 13, 2026 Collaborator

Uh oh!

Uh oh!

ElenaKhaustova Mar 16, 2026 Collaborator

Uh oh!

antonymilne Mar 16, 2026

Uh oh!

Uh oh!

ravi-kumar-pilla Mar 16, 2026 Collaborator

rashidakanchwala
Mar 4, 2026
Maintainer

Replies: 7 comments 2 replies

rashidakanchwala
Mar 5, 2026
Maintainer Author

lrcouto
Mar 12, 2026
Collaborator

noklam
Mar 12, 2026
Collaborator

deepyaman
Mar 13, 2026
Collaborator

ravi-kumar-pilla
Mar 13, 2026
Collaborator

ElenaKhaustova
Mar 16, 2026
Collaborator

antonymilne
Mar 16, 2026

ravi-kumar-pilla Mar 16, 2026
Collaborator