This is a modular, Object-Oriented Programming experimental framework for Text-to-Graph Query Generation (Text2Cypher/GQL). The system is designed to handle the entire pipeline from transforming natural language questions into graph queries, to executing and evaluating the results using multiple metrics.
This project supports integrating Qwen and other Large Language Models (LLMs) for prediction, connects to TuGraph for Execution Accuracy verification, and utilizes external tools for comprehensive metric analysis.
Full Pipeline Control: Supports flexible toggling between the Prediction Phase (LLM query generation) and the Evaluation Phase (metrics calculation).
LLM Integration: Built-in interface compatible with the OpenAI SDK format (default configuration targets AliCloud DashScope/Qwen).
Database Adaptability: Connects to TuGraph using the Bolt protocol (compatible with the Neo4j Python Driver).
Multi-Dimensional Evaluation:
- Execution Accuracy (EA): Compares query results returned by the database.
- Google BLEU: Text-level N-gram similarity.
- External Metrics: Integration with external tools to calculate Grammar Correctness and Structural Similarity.
Clean OOP Architecture: The use of abstract base classes (Drivers) makes it easy to extend the system with new models or database adapters.
\gql-generation-driver
├─ driver/ # Abstract Interface Definitions
├─ example_data/ # Sample Input Data (JSON)
├─ experiment/ # Configuration Files
├─ impl/ # Core Implementation Layer
│ ├─ db_driver/ # Database Adapters
│ ├─ evaluation/ # Evaluation Metrics Implementation
│ └─ text2graph_system/ # Generation System Implementation
├─ tools/
├─ output/ # Prediction Results Output Directory
├─ evaluation_detail/ # Detailed Evaluation Report Directory
├─ requirements.txt # Project dependency list
└─ run_pipeline.py # Program Main Entry Point
Ensure your Python version is >= 3.8. Run the following command in the project root directory:
Bash
pip install -r requirements.txtTo ensure Python can correctly resolve module imports within the project structure, set the PYTHONPATH before execution.
PowerShell (Windows):
PowerShell
$env:PYTHONPATH="."CMD (Windows):
DOS
set PYTHONPATH=.
Bash/Zsh (Linux/macOS):
Bash
export PYTHONPATH=.The main configuration file is located at experiment/test_config.json. Key fields are described below:
JSON
{
"pipeline": {
"run_prediction": true, // true: Calls LLM to generate queries; false: Loads existing results
"run_evaluation": true // true: Runs metrics calculation
},
"data": {
"input_path": "example_data/dataset.json",
"output_path": "output/prediction_result.json"
},
"prediction": {
"api_key": "sk-xxxxxx", // Your LLM API Key
"base_url": "https://dashscope...", // Model API Endpoint
"model": "qwen-plus", // Model name
"schema_path": "data/schema.json", // Path to the Graph Database Schema file
"max_workers": 5, // Concurrency level for API calls
"level_fields": [ // Defines the mapping for different query complexity levels
["initial_nl", "initial_query"],
["level_1", "level_1_query"]
]
},
"evaluation": {
"db_uri": "bolt://localhost:7687", // TuGraph/Neo4j Connection URI
"db_user": "admin",
"db_pass": "password",
"dbgpt_root": "tools/dbgpt-hub-gql" // Path to the external evaluation script root
}
}python run_pipeline.pyUse the --config argument to point to an alternative configuration file:
python run_pipeline.py --config experiment/debug_config.jsonThe sample input file, geography_5_csv_files_08051006_corpus_seeds.json, uses Cypher as the default graph query language for the standard answer fields. The accompanying .csv files and the import_config.json file are formatted specifically for batch import into TuGraph DB.
Each data entry includes information such as the database name, the original question, the layered reasoning questions, and optional external knowledge. The data format example is as follows:
{
"id": "Unique Identifier",
"database": "Database Name",
"initial_question": "Original Natural Language Question",
"initial_gql": "Correct Cypher/GQL corresponding to the original natural language question",
"level_1": "Level 1 (Coarse-grained Reasoning) Question",
"level_2": "Level 2 (Structured Reasoning) Question",
"level_3": "Level 3 (Sub-goal Planning) Question",
"level_4": "Level 4 (Final Reasoning) Question",
"external_knowledge": "Knowledge depending on sources outside the database (e.g., encyclopedia, common facts, can be empty)",
"difficulty": "Question Difficulty (easy / medium / hard)",
"source": "Data Source"
}The model generates the corresponding predicted query statement for each data entry, located in the path defined by data.output_path (e.g., output/). The format of the output file is as follows:
{
"id": "Unique Identifier",
"database": "Database Name",
"initial_question": "Original Natural Language Question",
"initial_gql": "Correct Cypher/GQL corresponding to the original natural language question",
"level_1": "Level 1 (Coarse-grained Reasoning) Question",
"level_2": "Level 2 (Structured Reasoning) Question",
"level_3": "Level 3 (Sub-goal Planning) Question",
"level_4": "Level 4 (Final Reasoning) Question",
"external_knowledge": "Knowledge depending on sources outside the database (e.g., encyclopedia, common facts, can be empty)",
"difficulty": "Question Difficulty (easy / medium / hard)",
"source": "Data Source",
"initial_query": "Model Predicted Query Statement",
"level_1_query": "Model Predicted Query Statement",
"level_2_query": "Model Predicted Query Statement",
"level_3_query": "Model Predicted Query Statement",
"level_4_query": "Model Predicted Query Statement"
}The evaluation will generate the following metrics for each layer's prediction:
- EA (Execution Accuracy)
- Grammar (Grammatical Validity)
- Similarity (Structural Similarity)
- Google BLEU (Textual Similarity)
And save them to: evaluation_detail/execution_results/
Each layer will correspond to a JSON file, for example: level_1_results.json
The file structure is as follows:
[
{
"instance_id": "Unique Identifier",
"gold_query": "Standard Answer Query Statement",
"pred_query": "Model Predicted Query Statement",
"cleaned_pred": "Cleaned Version (of the Predicted Query)",
"metrics": {
"accuracy": 1,
"grammar": 1,
"google_bleu": "0.633",
"similarity": 0.9212
},
"gold_result": [{...}] // Execution result of the Gold Query
"pred_result": [{...}] // Execution result of the Model Prediction
}
]