init commit avalon by HenryCai11 · Pull Request #60 · THUDM/AgentBench

HenryCai11 · 2023-10-22T10:11:20Z

AvalonBench

Quick Start

Start the task server and the assigner

Start the game (3 is the number of workers)

python -m src.start_task -a --start avalon-dev-single 3

Start the assigner

python -m src.assigner --config ./configs/assignments/test_avalon.yaml

Customize configurations and data

You can modify the file configs/tasks/avalon.yaml to configure the agent list. A config file looks like this:

default:
  module: "src.server.tasks.avalon.AvalonBench"
  parameters:
    num_players: 5
    discussion: False

avalon-dev-naive:
  parameters:
    name: "AvalonBench-dev-naive"
    data_file: "data/avalon/dev.json"
    agent_list: ["naive", "naive", "naive", "naive", "naive"]

avalon-dev-single:
  parameters:
    name: "AvalonBench-dev-single"
    data_file: "data/avalon/dev.json"
    agent_list: ["llm", "naive", "naive", "naive", "naive"]

where naive stands for the naive bots. Agents will play the roles with the same index in the data file (see following).

Note: There should only be one "llm" in the `agent_list`

You can also add data in data/avalon/dev.json (Note: Currently we only support the 5-player game setting, which includes 1 Merlin, 2 Servants, 1 Minion and 1 Assassin). A data item looks like this:

 {
     "num_players": 5,
     "quest_leader": 0,
     "role_names": ["Assassin", "Servant", "Servant", "Merlin", "Minion"]
 }

where quest_leader is the id of the initial quest leader in this game. You can change the game setup by altering quest_leader with number from 0 to 4, and by permuting role_names.

Naive experiment

You can also start a naive experiment using:

python -m src.start_task -a --start avalon-dev-naive 3

where all the agents are naive bots. For details of the naive strategies, please refer to the paper.

Prompts

All the prompts are maintained in src/server/tasks/avalon/prompt.py. You can find the respective prompts in src/server/tasks/avalon/agents/llm_with_discussion.py and src/server/tasks/avalon/wrapper.py.

Results

Results of single-setting games

{
    "total": 20,
    "validation": {
        "running": 0.0,
        "completed": 0.95,
        "agent context limit": 0.0,
        "agent validation failed": 0.05,
        "agent invalid action": 0.0,
        "task limit reached": 0.0,
        "unknown": 0.0,
        "task error": 0.0,
        "average_history_length": 11.0,
        "max_history_length": 14,
        "min_history_length": 2
    },
    "custom": {
        "Win rate of Player 0": 0.15,
        "Avg deduction acc of Player 0": 0.5399999999999998,
        "Valid number of games": 19
    }
}

…nGameEnvironment

…ION_FAILED

HenryCai11 added 18 commits October 22, 2023 18:10

init commit avalon

691c670

update config

03fc956

refactor classes for config and environment; addcomments and typings

a409d6d

base class for agents

8b3a912

move initialization of confidential configs from AgentConfig to Avalo…

4c0d9b3

…nGameEnvironment

rename AgentConfig to AvalonBasicConfig

4aff8e9

rename AgentConfig to AvalonBasicConfig

779457a

add mission id

69e2d38

class NaiveAgent inherents class Agent; add typings and comments

f9b70ac

add data and configs

6af67c8

refactor and implement naive strategies

f2b2ebd

add a mod to prevent extreme case in observe_mission

86130cf

1. refactor the code 2. add typings 3. add SampleStatus.AGENT_VALIDAT…

98c611a

…ION_FAILED

update README

9fb74ff

1. add configs 2. polish prompts

9254f71

add data and polish prompts

7ce7342

1. support concurrency; 2. add detailed exception handling

ab72a21

set concurrency

1fac876

zhc7 approved these changes Nov 7, 2023

View reviewed changes

Longin-Yu approved these changes Nov 7, 2023

View reviewed changes

Longin-Yu merged commit adc728e into THUDM:main Nov 7, 2023

Xiao9905 mentioned this pull request Nov 13, 2024

pull request #171

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

init commit avalon#60

init commit avalon#60
Longin-Yu merged 18 commits intoTHUDM:mainfrom
HenryCai11:main

HenryCai11 commented Oct 22, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

HenryCai11 commented Oct 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AvalonBench

Quick Start

Start the task server and the assigner

Customize configurations and data

Naive experiment

Prompts

Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HenryCai11 commented Oct 22, 2023 •

edited

Loading