Skip to content

Add path-scoped Copilot instructions for task YAML authoring #3614

@dgenio

Description

@dgenio

Why

lm_eval/tasks/ is by far the largest code surface in this repo — hundreds of YAML-configured evaluation tasks across dozens of subdirectories. Most external contributions are new task additions. Path-scoped Copilot instructions for this directory will guide agents (and code review) with task-specific schema rules, naming conventions, and testing requirements without bloating the repo-wide instructions.

Scope / Proposed changes

  • New file: .github/instructions/tasks.instructions.md (~100 lines)
  • New directory: .github/instructions/ (if it doesn't exist)

Proposed contents

---
applyTo: "lm_eval/tasks/**"
---

# Task YAML Authoring Instructions

These instructions apply when creating or modifying files in `lm_eval/tasks/`.

## Canonical Reference

For the full task authoring guide, see `docs/new_task_guide.md`.
For the full task configuration reference, see `docs/task_guide.md`.

## Directory Structure

Each task lives in its own subdirectory under `lm_eval/tasks/`:

lm_eval/tasks/<dataset_name>/
├── <task_name>.yaml # Task configuration (required)
├── _default_template.yaml # Shared defaults via !include (optional)
├── utils.py # Custom functions for process_docs, doc_to_text, etc. (optional)
└── README.md # Task documentation with paper citation (required for new tasks)


## Required YAML Fields

Every task YAML must include:

| Field | Description |
|-------|-------------|
| `task` | Unique task name (snake_case) |
| `dataset_path` | HuggingFace dataset name or path |
| `output_type` | One of: `multiple_choice`, `generate_until`, `loglikelihood`, `loglikelihood_rolling` |
| `doc_to_text` | Jinja2 template or function for model input |
| `doc_to_target` | Jinja2 template or function for expected output |

## Common Optional Fields

| Field | Default | Notes |
|-------|---------|-------|
| `dataset_name` | `null` | HF dataset config name |
| `test_split` | — | Split to evaluate on |
| `validation_split` | — | Split for validation |
| `fewshot_split` | — | Split for few-shot examples |
| `num_fewshot` | `0` | Number of few-shot examples |
| `metric_list` | — | List of metrics to compute |
| `filter_list` | — | Output post-processing filters |

## Naming Conventions

- Task names: `snake_case`, matching the dataset or paper name
- Subdirectory name: matches the dataset/benchmark name
- YAML filenames: match the task name (e.g., `gsm8k.yaml` for task `gsm8k`)
- Group YAML: use `_<group_name>.yaml` prefix for group definitions

## Jinja2 Templates

- Use `{{variable}}` for dataset column references
- For multiple choice: `doc_to_choice` must return a list of answer strings
- For generative tasks: set `generation_kwargs` (temperature, max_gen_toks, until)
- Multiline templates: use YAML literal block scalar `|` or folded block scalar `>`

## Testing

- Run `pytest tests/test_tasks.py -x -s -vv` to validate all task configs load correctly
- The CI workflow `new_tasks.yml` automatically runs task tests when `lm_eval/tasks/**` changes
- If possible, validate with a small model: `lm-eval run --model hf --model_args pretrained=EleutherAI/pythia-160m --tasks <your_task> --limit 10`

## Process Docs Function

If using `process_docs` to preprocess data, define the function in a `utils.py` file in the same directory:

```python
def process_docs(dataset):
    """Preprocess the dataset before evaluation."""
    # Return modified dataset
    return dataset.map(transform_fn)

Reference it in the YAML as:

process_docs: !function utils.process_docs

Common Patterns

  • Shared defaults: Use include to inherit from a parent YAML: include: _default_template.yaml
  • Task groups: Create a group YAML that lists subtasks under task as a list
  • Custom metrics: Define metric functions in utils.py and reference via !function
  • Answer extraction: Use filters (regex, take_first) to extract answers from model output

## Labels to apply

- **Base**: `agent-readiness`
- **Priority**: `priority:high`
- **Area**: `documentation`

## Depends on

- #3610 (label creation)
- #3611 (.github/copilot-instructions.md must exist; this file supplements it)

## Related existing issues

None directly, though many open issues involve task configuration problems (e.g., #2552, #2479).

## Acceptance criteria

- [ ] `.github/instructions/tasks.instructions.md` exists and is ≤120 lines
- [ ] YAML frontmatter has `applyTo: "lm_eval/tasks/**"`
- [ ] No `excludeAgent` field (both coding agent and code review should see these instructions)
- [ ] All referenced paths and patterns exist in the repo
- [ ] Guidance links to `docs/new_task_guide.md` and `docs/task_guide.md` rather than duplicating them
- [ ] File passes pymarkdown lint (excluding YAML frontmatter)

## Avoid drift/duplication notes

- This file covers **task-specific** guidance only. General repo conventions are in `.github/copilot-instructions.md`.
- Canonical task authoring procedures live in `docs/new_task_guide.md` — this file summarizes key rules and links there.
- If task YAML schema changes, update both this file and `docs/task_guide.md`.

## References

- [GitHub Docs: Path-specific custom instructions](https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot#creating-path-specific-custom-instructions)
- Example syntax for `applyTo` frontmatter:
  ```yaml
  ---
  applyTo: "lm_eval/tasks/**"
  ---

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions