PLAY2PROMPT - Interactive Tool-use Example Generation & Tool Documentation Optimization

Implementation for the paper PLAY2PROMPT.

Installation

Install requirements in requirements.txt
Update submodules git submodule update --init

API/Dataset Setup

Setup and install dataset and evaluation code (see bfcl for example). Also need to sign-up and set required API keys (eg .env in bfcl/berkeley-function-call-leaderboard).
Create script for loading dataset and API wrapper (for interacting with APIs). See bfcl_api.py for example.
Create script for evaluating generated examples/descriptions. Write newly generated examples/descriptions to files, run evaluation script, then read scores and results/error. See bfcl_eval.py for example.

Run Optimization

This project was developed with IBM-hosted LLMs (mainly LLaMA models); to use, set/export the env variables LLM_API_KEY and RITS_ENDPOINT. To use vLLM or other API services, modify rits.py.

For BFCL, we also need to set:

SCRIPT_PATH=$(realpath "$0")
CUR_DIR=$(dirname "${SCRIPT_PATH}")
DATA_DIR=${CUR_DIR}/bfcl/berkeley-function-call-leaderboard

For example generation:

python main.py \
  --method example \
  --data_dir ${DATA_DIR} \
  --tmp_dir ${DATA_DIR}/data_tmp \
  --max_eval_threads 2 \
  --search_num_workers 5 \
  --gen_model_id meta-llama/Llama-3.1-8B-Instruct \
  --tool_model_id meta-llama/Llama-3.1-8B-Instruct \
  --save_dir outputs/generated_examples \
  --batch_size 10 \
  --expand_num 3 \
  --top_k 10 \
  --max_iterations 3 \
  --num_init_loop 50 \
  --num_feedback_steps 2 \
  --num_refine_steps 3 \
  --score_eval_weight 0.0 \
  --max_score 3.0 \
  --check_valid \
  --early_stop \
  $@

For documentation optimization:

python main.py \
  --method description \
  --data_dir ${DATA_DIR} \
  --tmp_dir ${DATA_DIR}/data_tmp \
  --gen_model_id meta-llama/Llama-3.1-8B-Instruct \
  --tool_model_id meta-llama/Llama-3.1-8B-Instruct \
  --examples_dir outputs/generated_examples \
  --save_dir outputs/generated_descriptions \
  --num_examples_for_desc 10 \
  --batch_size 5 \
  --expand_num 5 \
  --max_iterations 3 \
  --top_k 3 \
  --max_score 100 \
  --early_stop \
  $@

Code Structure

main.py: main optimization script
beam_search: beam search framework
example_method.py: defines a single search step for tool-use example optimization
description_method: defines a single search step for description optimization
*_api.py: dataset loading and dataset API wrapper
*_eval.py: defines pipeline for evaluating performance on generated data for a dataset

Citation

If you find our work helpful, please cite us as:

@inproceedings{fang-etal-2025-play2prompt,
    title = "{PLAY}2{PROMPT}: Zero-shot Tool Instruction Optimization for {LLM} Agents via Tool Play",
    author = "Fang, Wei  and
      Zhang, Yang  and
      Qian, Kaizhi  and
      Glass, James R.  and
      Zhu, Yada",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1347/",
    pages = "26274--26290",
    ISBN = "979-8-89176-256-5",
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bfcl @ c29e434		bfcl @ c29e434
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
base_method.py		base_method.py
beam_search.py		beam_search.py
bfcl_api.py		bfcl_api.py
bfcl_eval.py		bfcl_eval.py
description_method.py		description_method.py
example_method.py		example_method.py
main.py		main.py
requirements.txt		requirements.txt
rits.py		rits.py
toolbench_api.py		toolbench_api.py
toolbench_eval.py		toolbench_eval.py
tree.py		tree.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PLAY2PROMPT - Interactive Tool-use Example Generation & Tool Documentation Optimization

Installation

API/Dataset Setup

Run Optimization

Code Structure

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

wfangtw/play2prompt

Folders and files

Latest commit

History

Repository files navigation

PLAY2PROMPT - Interactive Tool-use Example Generation & Tool Documentation Optimization

Installation

API/Dataset Setup

Run Optimization

Code Structure

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages