This is the repository for the paper Decompose-ToM: Enhancing Theory of Mind Reasoning in Large Language Models through Simulation and Task Decomposition.
Install dependencies
Navigate to the code directory and install the required Python packages:
pip install -r requirements.txtThere are two main evaluation scripts:
python evaluate_hitom.py [--category CATEGORY] [--model MODEL] [--model_type {openai,gemini,local}] [--parallel_execution] [--random_example] [--method {cot,baseline,simtom,decompose}] [--num_problems N] [--num_parallel N]Key arguments:
--category: Category to evaluate (default: all)--model: Model name (default: gpt-4o)--model_type: Model type (openai,gemini, orlocal; default: openai)--parallel_execution: Enable parallel execution--random_example: Evaluate a single random example--method: Evaluation method (cot,baseline,simtom,decompose; default: baseline)--num_problems: Number of problems to evaluate (default: 0 = all)--num_parallel: Number of threads for parallel execution (default: all CPU cores)
python evaluate_fantom.py [--model MODEL] [--model_type {openai,gemini,local}] --method {baseline,cot,simtom,decompose} [--num_problems N] [--context {short,full}] [--parallel_execution] [--num_parallel N]Key arguments:
--file: Path to the dataset JSONL file (default: ../data/fantomtom.jsonl)--model: Model name (default: gpt-4o)--model_type: Model type (openai,gemini, orlocal; default: openai)--method: Evaluation method (required:baseline,cot,simtom,decompose)--num_problems: Number of problems to evaluate (default: 0 = all)--context: Context type (shortorfull; default: short)--parallel_execution: Enable parallel execution--num_parallel: Number of threads for parallel execution (default: all CPU cores)
- The scripts use OpenAI and Google Gemini APIs. Make sure to set your API keys as environment variables:
OPENAI_API_KEYfor OpenAIGEMINI_API_KEYfor Gemini (Google Generative AI)
- You can change model settings in
llm_utils.pyor via script arguments.
evaluate_hitom.py/evaluate_fantom.py: Main evaluation scriptsllm_utils.py: Language model utility functionsnew_decompose.py: Core ToM system logicsimtom/,prompts/: Supporting modules and prompt templates
The TheoryOfMindSystem class contains the Decompose-ToM method and can be configured to be run with different datasets.