Text2SQL Use case Evaluation #1837

LStromann · 2025-11-26T15:00:52Z

LStromann
Nov 26, 2025
Collaborator

=Hello Cognee community,

I’m working on a use case involving an AI Agent that performs Text-to-SQL with additional analytical steps. I first tried a custom multi-agent solution, but it struggles with producing consistently correct SQL for more complex queries.

I’m now evaluating Cognee, and I want to confirm whether my understanding is correct.

My setup:

Large database: 400+ tables
Complex relational schema (many FKs)
~5 TB of data

Based on the Cognee docs, it seems the recommended approach is to:

Load the database into a Knowledge Graph, where:

Rows become nodes
Tables become node types
Foreign keys become edges

Let Cognee embed the structured data + relationships into a vector store.

Use Cognee’s retrieval + reasoning stack to improve grounding, so the agent can better understand table relationships, constraints, and the dataset’s structure—hopefully enabling more accurate Text-to-SQL generation compared to my current bespoke agentic pipeline.

Is this the right mental model? And in a dataset of this size/complexity, is this the intended or recommended way to get high-accuracy query generation? How will the cost scale with the large dataset size that I have?

Looking forward to hearing from you.

https://www.cognee.ai/blog/deep-dives/relational-database-to-knowledge-graph-cognee-dlt

This discussion was automatically pulled from Discord.

LStromann · 2025-11-26T15:01:12Z

LStromann
Nov 26, 2025
Collaborator Author

You are correct with the mental model, we have an example with a small subset of an SQLite relational database in the example you can try and see the results:

https://github.com/topoteretes/cognee/blob/main/examples/python/relational_database_migration_example.py

It uses a subset of this example relational database:

https://github.com/lerocha/chinook-database

that represents a music store. I'd recommend trying it out first with this example, but just to note we are not doing Text-to-SQL generation we are migrating the whole SQL database to a graph database and embedding the nodes and edges there into a vector database and later on querying the graph+vector store for information, this migration process takes some time and memory.

Also a graph database can't handle ~5 TB of data in one instance so you will most likely need to split the data in a meaningful way to accommodate for the graph database limitations. You can also specify what should be migrated from a database with customizing the input schema if you don't need all the data.

0 replies

LStromann · 2025-11-26T15:01:31Z

LStromann
Nov 26, 2025
Collaborator Author

Thanks <@508682252595626015>

How does this implementation handle data syncs? What happens if we add more data or add new tables?

What about cost? Is it sensitive to the size of the data or scales only on the number of tables and foreign keys?

0 replies

LStromann · 2025-11-26T15:01:57Z

LStromann
Nov 26, 2025
Collaborator Author

It's sensitive on the number of elements in the database:

0 replies

LStromann · 2025-11-26T15:02:31Z

LStromann
Nov 26, 2025
Collaborator Author

This is a graph visualization of a 1MB migrated relational db

0 replies

LStromann · 2025-11-26T15:03:08Z

LStromann
Nov 26, 2025
Collaborator Author

you can reduce the number of elements in the graph by excluding tables and columns that are not relevant and reducing vector embedding size (as every element is represented in an n-dimensional vector space)

0 replies

LStromann · 2025-11-26T15:03:53Z

LStromann
Nov 26, 2025
Collaborator Author

The only migration cost is memory and embedding calls (which can be done by a local LLM model)

0 replies

LStromann · 2025-11-26T15:04:43Z

LStromann
Nov 26, 2025
Collaborator Author

so the actual LLM calls will only be done on search time

0 replies

LStromann · 2025-11-26T15:05:37Z

LStromann
Nov 26, 2025
Collaborator Author

Memory can really bloat though if everything is migrated, worst case scenario is with a 3072 dimension vector representation of each node and edge and fully migrating all elements it can be 100x the size of the original database

But this can be drastically reduced by limiting the vector embedding size to for example 768 dimensions and not migrating every row and column in the database by specifying the schema (or just migrating database rows without columns, but this might come with a bit of a lesser accuracy). Depending on these settings the graph+vector db size can be lower than the original database size

0 replies

LStromann · 2025-11-26T23:05:07Z

LStromann
Nov 26, 2025
Collaborator Author

Thanks for this <@508682252595626015>

Clear

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topoteretes

Text2SQL Use case Evaluation #1837

Uh oh!

{{title}}

Uh oh!

Replies: 9 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Topoteretes

Text2SQL Use case Evaluation #1837

Uh oh!

LStromann Nov 26, 2025 Collaborator

Replies: 9 comments

Uh oh!

LStromann Nov 26, 2025 Collaborator Author

Uh oh!

LStromann Nov 26, 2025 Collaborator Author

Uh oh!

LStromann Nov 26, 2025 Collaborator Author

Uh oh!

LStromann Nov 26, 2025 Collaborator Author

Uh oh!

LStromann Nov 26, 2025 Collaborator Author

Uh oh!

LStromann Nov 26, 2025 Collaborator Author

Uh oh!

LStromann Nov 26, 2025 Collaborator Author

Uh oh!

LStromann Nov 26, 2025 Collaborator Author

Uh oh!

LStromann Nov 26, 2025 Collaborator Author

LStromann
Nov 26, 2025
Collaborator

LStromann
Nov 26, 2025
Collaborator Author

LStromann
Nov 26, 2025
Collaborator Author

LStromann
Nov 26, 2025
Collaborator Author

LStromann
Nov 26, 2025
Collaborator Author

LStromann
Nov 26, 2025
Collaborator Author

LStromann
Nov 26, 2025
Collaborator Author

LStromann
Nov 26, 2025
Collaborator Author

LStromann
Nov 26, 2025
Collaborator Author

LStromann
Nov 26, 2025
Collaborator Author