Skip to content

Add behavioral evals for the memory subagent #22805

@SandyTao520

Description

@SandyTao520

The memory subagent (from #22716) needs evals to make sure it actually does the right thing and doesn't regress as we iterate. The design doc calls out several specific behaviors we need to lock in:

  • Adding memories – agent saves relevant facts when the user states a preference or the session produces something worth remembering.
  • Removing stale memories – when the agent touches a memory file, it notices and cleans up entries that are no longer true.
  • De-duping – agent doesn't create a second entry for something it already knows. It consolidates instead.
  • Organizing + TOC updates – GEMINI.md stays clean and structured as a table of contents; the agent updates it when memories change.
  • Routing – user preferences go to ~/.gemini/, project knowledge goes to .gemini/. The doc specifically flags this as needing evals.

These should live in evals/ alongside existing behavioral evals. At least one eval per category.

Metadata

Metadata

Assignees

Labels

area/agentIssues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Qualityworkstream-rollupLabel used to tag epics and features that are associated with one of the three primary workstreams🔒 maintainer only⛔ Do not contribute. Internal roadmap item.

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions