feat(datasets): Shorten `pyproject.toml` extra names for `langfuse`, `opik`, and `langchain` datasets by ElenaKhaustova · Pull Request #1365 · kedro-org/kedro-plugins

ElenaKhaustova · 2026-03-31T19:39:33Z

Description

Development notes

Removes redundant package-family prefix from dataset-specific extras (e.g. langfuse-langfusepromptdataset → langfuse-promptdataset), making install commands shorter and more consistent with other extras.*
Meta-extras (langfuse, opik, langchain) are updated to reference the new names. The installed dependencies are unchanged.
Updates all references in langfuse and opik READMEs to match the new names

Test plan

pip install kedro-datasets[langfuse-promptdataset] resolves correctly
pip install kedro-datasets[langfuse] still installs all langfuse deps
CI passes (no functional code changes)

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Updated jsonschema/kedro-catalog-X.XX.json if necessary
Added a description of this change in the relevant RELEASE.md file
Added tests to cover my changes
Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

…aluation-dataset-bu

Signed-off-by: ElenaKhaustova <157851531+ElenaKhaustova@users.noreply.github.com>

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

ravi-kumar-pilla

LGTM

SajidAlamQB

Thank you @ElenaKhaustova!

merelcht

Much better 👍

ankatiyar · 2026-04-01T10:08:49Z

The shorter names look good but I'm worried it strays from the convention we have set for the dependencies - the dataset after all is called langchain.LangchainPromptDataset etc. The dependency names are too long but they follow the same standard across all datasets.
These datasets are all experimental of course so maybe this is okay.
When they graduate, we could also consider renaming them opik.OpikPromptDataset & opik.OpikTraceDataset to opik.PromptDataset and opik.TraceDataset but that might be too disruptive for the users who already have adopted them. 🤔 In the case of choosing between renaming datasets or having long dependency names, it would be better to stick with long dependencies in my opinion.
I'm not opposed to this change but just checking if we'll be setting a precedent where the dataset and dependency name might be slightly different and users have to double check in the pyproject.toml here to make sure which one it is!

ElenaKhaustova · 2026-04-01T10:53:36Z

The shorter names look good but I'm worried it strays from the convention we have set for the dependencies - the dataset after all is called langchain.LangchainPromptDataset etc. The dependency names are too long but they follow the same standard across all datasets.

Yes, that is a valid point, and I think we should also rename langchain.LangchainPromptDataset to langchain.PromptDataset. The problem is that if we go ahead with this for the already-released datasets we will probably need to add the short name as an alias with a deprecation warning on the old name. For example, from kedro_datasets_experimental.langfuse import PromptDataset works, and LangfusePromptDataset still works but logs a deprecation. To give users a migration window.

And I see the following pros for renaming:

shorter names for dependencies
langfuse.EvaluationDataset is arguably clearer than langfuse.LangfuseEvaluationDataset since the package already tells you it's Langfuse.
It also matches how the core LangChain datasets work — langchain.ChatOpenAIDataset, not langchain.LangChainChatOpenAIDataset.

So we probably should either rename both dependencies and datasets or leave them as is. My question: do you think it's worth it? @merelcht, @ankatiyar

ankatiyar · 2026-04-01T11:29:27Z

Ideally I also like the dataset names to be langfuse.PromptDataset and langfuse.TraceDataset (similarly for opik etc) , then the dependency names could also be short and follow all convention. We have precedent for this as well with various CSVDataset/JSONDataset/ParquetDatasets (pandas and dask). We also have some datasets that repeat the package name (svmlight.SVMLightDataset, netcdf.NetCDFDataset)

Since these are experimental datasets, we could update the names (with or without a deprecation warning, but for user experience aliasing might be good). We would also have to update the projects in kedro-academy, the starter and maybe blog posts? We can do it now (while they're still experimental and maybe not many people use it in the wild) or when we consider graduating them (people will have to migrate anyway, could rename datasets at the same time).
However, if we're not updating the names, maybe we could stick with the longer dependencies for now? Will defer to @merelcht's judgement

merelcht · 2026-04-01T13:39:18Z

I was thinking about this too, thanks for raising it @ankatiyar. I think we should take the benefit of these being experimental and just doing the rename without a transition period. Normally I would definitely be against that, but the whole point of these being experimental is that we're allowing slightly less solid datasets to be released and therefore breaking changes can happen while polishing the datasets between releases.

As an extra check we can have a look at telemetry to see if many people are using these datasets and make a decision based on that if we see adoption is already high.

ElenaKhaustova · 2026-04-01T15:28:15Z

We had a discussion with @merelcht and decided to do the full renaming, but hold it until #1364 is completed

ElenaKhaustova and others added 30 commits March 16, 2026 15:16

Moved dataset from the academy repo

89acfcb

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Adapted dataset to the repo

0b48501

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Updated requirements

784ab23

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Added unit tests

b856455

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Extended readme

0fafdcb

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Added docs

463e422

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Updated nav

928740f

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Updated release notes

ef24144

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Restored alphabetical order

9d06d46

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Converted method to static

bd1d182

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Fixed linter

ee71498

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Fixed ruff

816799b

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Fixed version normalisation for python3.10

ed4e516

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Updated sync modes

543ee70

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Updated unit tests

c738257

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Updated readme

674e21b

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Fixed test on Windows

fb130c4

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Fixed links in the docs

29a01ee

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Merge branch 'main' into feat/langfuse-evaluation-dataset

631f020

Merge branch 'main' into feat/langfuse-evaluation-dataset

999517b

Moved validation upper

d5daca2

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Fixed validation order

c34af87

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Added unit tests checking the updted logic

6924ce3

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Merge branch 'feat/langfuse-evaluation-dataset' into feat/langfuse-ev…

5c9ae45

…aluation-dataset-bu

Merge branch 'main' into feat/langfuse-evaluation-dataset

3befc54

Merge branch 'main' into feat/langfuse-evaluation-dataset

3f042e5

Signed-off-by: ElenaKhaustova <157851531+ElenaKhaustova@users.noreply.github.com>

Clarified docstrings

f884b97

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Added table with datasets to readme

fb207c5

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

Merge branch 'main' into feat/langfuse-evaluation-dataset

e92b393

Updated extras names

66b8920

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

ElenaKhaustova marked this pull request as ready for review March 31, 2026 19:39

Updated release notes

9c9e54b

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

ElenaKhaustova marked this pull request as draft March 31, 2026 19:43

ElenaKhaustova mentioned this pull request Mar 31, 2026

feat(datasets): Add LangfuseEvaluationDataset to experimental datasets #1347

Merged

6 tasks

ElenaKhaustova marked this pull request as ready for review March 31, 2026 19:44

ElenaKhaustova requested review from SajidAlamQB, ankatiyar, merelcht and ravi-kumar-pilla and removed request for SajidAlamQB and merelcht March 31, 2026 19:44

ravi-kumar-pilla approved these changes Mar 31, 2026

View reviewed changes

ElenaKhaustova requested a review from lrcouto March 31, 2026 19:55

SajidAlamQB approved these changes Apr 1, 2026

View reviewed changes

merelcht approved these changes Apr 1, 2026

View reviewed changes

ElenaKhaustova self-assigned this Apr 1, 2026

ElenaKhaustova added this to Kedro 🔶 Apr 1, 2026

ElenaKhaustova moved this to In Review in Kedro 🔶 Apr 1, 2026

ElenaKhaustova mentioned this pull request Apr 1, 2026

Release kedro-datasets 9.3.0 #1350

Closed

Base automatically changed from feat/langfuse-evaluation-dataset to main April 1, 2026 10:08

Merge branch 'main' into feat/rename-genai-extras

361dd46

ElenaKhaustova marked this pull request as draft April 1, 2026 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(datasets): Shorten `pyproject.toml` extra names for `langfuse`, `opik`, and `langchain` datasets#1365

feat(datasets): Shorten `pyproject.toml` extra names for `langfuse`, `opik`, and `langchain` datasets#1365
ElenaKhaustova wants to merge 32 commits intomainfrom
feat/rename-genai-extras

ElenaKhaustova commented Mar 31, 2026 •

edited

Loading

Uh oh!

ravi-kumar-pilla left a comment

Uh oh!

SajidAlamQB left a comment

Uh oh!

merelcht left a comment

Uh oh!

ankatiyar commented Apr 1, 2026

Uh oh!

ElenaKhaustova commented Apr 1, 2026 •

edited

Loading

Uh oh!

ankatiyar commented Apr 1, 2026

Uh oh!

merelcht commented Apr 1, 2026

Uh oh!

ElenaKhaustova commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ElenaKhaustova commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Development notes

Test plan

Developer Certificate of Origin

Checklist

Uh oh!

ravi-kumar-pilla left a comment

Choose a reason for hiding this comment

Uh oh!

SajidAlamQB left a comment

Choose a reason for hiding this comment

Uh oh!

merelcht left a comment

Choose a reason for hiding this comment

Uh oh!

ankatiyar commented Apr 1, 2026

Uh oh!

ElenaKhaustova commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ankatiyar commented Apr 1, 2026

Uh oh!

merelcht commented Apr 1, 2026

Uh oh!

ElenaKhaustova commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ElenaKhaustova commented Mar 31, 2026 •

edited

Loading

ElenaKhaustova commented Apr 1, 2026 •

edited

Loading