Collaborative Effort Scaling

This repository contains the code and analysis for "Completion ≠ Collaboration: Scaling Collaborative Effort with Agents", a framework that captures how an agent's utility grows with increasing user involvement and reveals critical gaps in current agents' ability to sustain engagement and scaffold user understanding.

📝 Read the paper | 🌐 Visit the project page

Overview

Current evaluations of agents remain centered around one-shot task completion, failing to account for the inherently iterative and collaborative nature of many real-world problems where human goals are often underspecified and evolve. We introduce collaborative effort scaling, a framework that captures how an agent's utility grows with increasing user involvement.

About Collaborative Gym

This work builds on Collaborative Gym (Co-Gym), a framework for enabling and evaluating human-agent collaboration in shared workspaces. Co-Gym provides the infrastructure for both agents and humans to exchange messages and take actions collaboratively on tasks like travel planning, literature surveys, and tabular analysis.

For details on the Co-Gym framework, environment setup, and agent development, please see README-cogym.md.

Setup

# Create conda environment
conda create -n cogym python=3.11
conda activate cogym

# Install dependencies
pip install -r requirements.txt

# Set up API keys
cp secrets.example.toml secrets.toml
# Edit secrets.toml with your API keys

# (Optional) Build Docker image for Jupyter execution
cd docker
docker build -f Dockerfile_cpu -t cogym-jupyter-cpu-image .

# (for MAC users) You may want to run this 
export DOCKER_HOST=unix:///Users/$USER/.docker/run/docker.sock

Running Experiments

# Running the simulation 
bash scripts/travel_planning_claude40.sh

# Running progress evaluation 
python -m collaborative_gym.eval.progress_eval_v2 --result-dir workdir/travel_planning/travel_planning_basic_coagent_claude40/results --task travel_planning
python -m collaborative_gym.eval.progress_eval_v2 --result-dir workdir/travel_planning/travel_planning_coagent_with_situational_planning_claude40/results --task travel_planning

# Obtain simulated user rating over the collaboration process 
python -m collaborative_gym.eval.likert_score --result-dir workdir/travel_planning/travel_planning_basic_coagent_claude40/results --task travel_planning
python -m collaborative_gym.eval.likert_score --result-dir workdir/travel_planning/travel_planning_coagent_with_situational_planning_claude40/results --task travel_planning

# Then you may want to move the files to the final_results folder
mkdir -p final_results  
mv workdir/travel_planning/travel_planning_basic_coagent_claude40 final_results/travel_planning_one_stage_co_agent_claude40
mv workdir/travel_planning/travel_planning_coagent_with_situational_planning_claude40 final_results/travel_planning_two_stage_co_agent_claude40

Analysis

Please refer to paper-analysis.ipynb for the analysis and plotting code.

Citation

If you use this work, please cite both the collaborative effort scaling paper and the Collaborative Gym framework:

@article{shen2025designing,
  title={Completion $\neq$ Collaboration: Scaling Collaborative Effort with Agents},
  author={Shen, Shannon Zejiang and Chen, Valerie and Gu, Ken and Ross, Alexis and Ma, Zixian and Gu, Alex and Si, Chenglei and Ross, Jillian and Shen, Jocelyn J and Chi, Wayne and Peng, Andi and Talwalkar, Ameet and Wu, Tongshuang and Sontag, David},
  journal={arXiv preprint arXiv:2510.25744},
  year={2025}
}

@misc{shao2025collaborativegym,
  title={Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration}, 
  author={Yijia Shao and Vinay Samuel and Yucheng Jiang and John Yang and Diyi Yang},
  year={2025},
  eprint={2412.15701},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2412.15701}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
assets		assets
collaborative_gym		collaborative_gym
configs		configs
datasets		datasets
demo_agent		demo_agent
docker		docker
documentation/indexing		documentation/indexing
frontend/workbench		frontend/workbench
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README-cogym.md		README-cogym.md
README.md		README.md
paper-analysis.ipynb		paper-analysis.ipynb
requirements.txt		requirements.txt
secrets.example.toml		secrets.example.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Collaborative Effort Scaling

Overview

About Collaborative Gym

Setup

Running Experiments

Analysis

Citation

About

Uh oh!

Releases

Packages

Languages

License

clinicalml/collaborative-effort-scaling

Folders and files

Latest commit

History

Repository files navigation

Collaborative Effort Scaling

Overview

About Collaborative Gym

Setup

Running Experiments

Analysis

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages