• 📖 Introduction • ✨ Pipeline • ⚡️ Evaluation • 📧 Contact
ToolMaster is a framework that shifts tool learning from static imitation to a trial-and-execution paradigm, enabling Large Language Models (LLMs) to actively master tools. It trains agents to first conduct autonomous tool trials during a "trial phase" to accumulate experiential knowledge. This experience informs the subsequent "execution phase," where the model performs planning and solving while explicitly employing self-correction to rectify errors based on environmental feedback. By leveraging Supervised Fine-Tuning (SFT) on teacher-synthesized trajectories—encompassing both tool trials and self-correction behaviors—followed by Reinforcement Learning (RL) to coordinate these phases, ToolMaster empowers agents to dynamically adapt to unfamiliar tools, significantly enhancing generalization and robustness.
1. Set up ToolMaster
First, clone the repository and install the dependencies.
git clone
cd ToolMaster
conda create --name toolmaster python=3.11
conda activate toolmaster
pip install -e ./2. Set up API Server (StableToolBench)
Then, prepare the environment for the tool server execution.
conda create --name api_server python=3.9
conda activate api_server
pip install -r requirements_server.txtTo initialize the StableToolBench tool server, you need to download the tool environment dataset. You can download it from HuggingFace.
To use the server, you will further need a toolbench key. You can apply one from this form.
After downloading, extract and organize the data into the following directory structure:
├── data
│ └── stabletoolbench
│ └── ToolEnv2404
│ └── toolsNote: This step is optional. You can skip the manual synthesis by directly using our pre-processed SFT dataset from
data/for_sft.
If you prefer to synthesize the data from scratch, follow the pipeline below:
Step 1: Start the API Server
Launch the tool server to enable environment interaction. To enable efficient environment interaction, we employ MirrorAPI-Cache.
conda activate toolmaster
vllm serve 'path-to-MirrorAPI-Cache-model' --api-key EMPTY --port 12345 --served-model-name MirrorAPI-Cache --gpu-memory-utilization 0.8 --tensor-parallel-size 1
conda activate api_server
python ./src/api_server/main_mirrorapi_cache.py Step 2: Run Data Synthesis & Filtering
Use the teacher model to generate tool-use trajectories and filter them to ensure quality.
bash scripts/conduct_data.shNote: If you do not want to train the model from scratch, you may skip this section and directly proceed to Evaluation
First, download the backbone model Qwen2.5-7B-Instruct from HuggingFace. Then, fine-tune the model using the curated dataset to establish baseline tool-use capabilities:
bash scripts/train/train_sft.shWe utilize Group Relative Policy Optimization (GRPO) for the reinforcement learning stage, implemented based on the verifiers framework.
This process involves multiple interacting components, so we recommend opening separate terminals for each step.
Step 1: Start the API Server
Launch the simulated tool environment using MirrorAPI-Cache. This provides a cost-effective and stable alternative to real-time API calls.
Note: This requires two sub-processes: the Model Server (vLLM) and the API Wrapper.
# [Terminal 1] Serve the MirrorAPI model
conda activate toolmaster
vllm serve 'path-to-MirrorAPI-Cache-model' \
--api-key EMPTY \
--port 12345 \
--served-model-name MirrorAPI-Cache \
--gpu-memory-utilization 0.8 \
--tensor-parallel-size 1
# [Terminal 2] Start the API wrapper
conda activate api_server
python ./src/api_server/main_mirrorapi_cache.py Step 2: Start the Inference Server
Initialize the inference backend to generate rollouts (trajectories) during training.
# [Terminal 3]
conda activate toolmaster
bash scripts/inference_serve.shStep 3: Start RL Training
Once all servers are running, initiate the GRPO training process.
# [Terminal 4]
conda activate toolmaster
bash scripts/train_rl.sh You can directly download our pre-trained models from HuggingFace.
Evaluate the model's generalization in multi-hop reasoning scenarios.
conda activate toolmaster
# 1. Run inference to generate trajectories
bash scripts/inference_ToolHop.sh
# 2. Compute evaluation metrics
bash scripts/eval_ToolHop.shEvaluate the model's generalization on tool-rich environments (movie database).
Prerequisite: You need a valid TMDB API Key to run this benchmark. You can apply for one here.
conda activate toolmaster
# 1. Run inference
bash scripts/inference_TMDB.sh
# 2. Compute metrics
bash scripts/eval_TMDB.shFor StableToolBench, you must launch a specific API server environment (simulating the GPT-4o based environment) before running inference.
Prerequisite: To access the server, you need a valid ToolBench Key. You can apply for one using this form.
Step 1: Launch API Server
# [Terminal 1] Start the environment server
conda activate api_server
python ./src/api_server/main.py Step 2: Inference & Evaluation Once the server is running, execute the evaluation scripts in a separate terminal.
# [Terminal 2]
conda activate toolmaster
# Run inference
bash scripts/inference_StableToolBench.sh
# Compute metrics
bash scripts/eval_StableToolBench.sh
If you have questions, suggestions, and bug reports, please email:
gaoxingjie@mails.neu.edu.cn
