Replies: 9 comments
-
|
You are correct with the mental model, we have an example with a small subset of an SQLite relational database in the example you can try and see the results: It uses a subset of this example relational database: https://github.com/lerocha/chinook-database that represents a music store. I'd recommend trying it out first with this example, but just to note we are not doing Text-to-SQL generation we are migrating the whole SQL database to a graph database and embedding the nodes and edges there into a vector database and later on querying the graph+vector store for information, this migration process takes some time and memory. Also a graph database can't handle ~5 TB of data in one instance so you will most likely need to split the data in a meaningful way to accommodate for the graph database limitations. You can also specify what should be migrated from a database with customizing the input schema if you don't need all the data. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks <@508682252595626015> How does this implementation handle data syncs? What happens if we add more data or add new tables? What about cost? Is it sensitive to the size of the data or scales only on the number of tables and foreign keys? |
Beta Was this translation helpful? Give feedback.
-
|
It's sensitive on the number of elements in the database: |
Beta Was this translation helpful? Give feedback.
-
|
This is a graph visualization of a 1MB migrated relational db |
Beta Was this translation helpful? Give feedback.
-
|
you can reduce the number of elements in the graph by excluding tables and columns that are not relevant and reducing vector embedding size (as every element is represented in an n-dimensional vector space) |
Beta Was this translation helpful? Give feedback.
-
|
The only migration cost is memory and embedding calls (which can be done by a local LLM model) |
Beta Was this translation helpful? Give feedback.
-
|
so the actual LLM calls will only be done on search time |
Beta Was this translation helpful? Give feedback.
-
|
Memory can really bloat though if everything is migrated, worst case scenario is with a 3072 dimension vector representation of each node and edge and fully migrating all elements it can be 100x the size of the original database But this can be drastically reduced by limiting the vector embedding size to for example 768 dimensions and not migrating every row and column in the database by specifying the schema (or just migrating database rows without columns, but this might come with a bit of a lesser accuracy). Depending on these settings the graph+vector db size can be lower than the original database size |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for this <@508682252595626015> Clear |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
=Hello Cognee community,
I’m working on a use case involving an AI Agent that performs Text-to-SQL with additional analytical steps. I first tried a custom multi-agent solution, but it struggles with producing consistently correct SQL for more complex queries.
I’m now evaluating Cognee, and I want to confirm whether my understanding is correct.
My setup:
Based on the Cognee docs, it seems the recommended approach is to:
Load the database into a Knowledge Graph, where:
Let Cognee embed the structured data + relationships into a vector store.
Use Cognee’s retrieval + reasoning stack to improve grounding, so the agent can better understand table relationships, constraints, and the dataset’s structure—hopefully enabling more accurate Text-to-SQL generation compared to my current bespoke agentic pipeline.
Is this the right mental model? And in a dataset of this size/complexity, is this the intended or recommended way to get high-accuracy query generation? How will the cost scale with the large dataset size that I have?
Looking forward to hearing from you.
https://www.cognee.ai/blog/deep-dives/relational-database-to-knowledge-graph-cognee-dlt
This discussion was automatically pulled from Discord.
Beta Was this translation helpful? Give feedback.
All reactions