Skip to content

bug: basename(cwd) project detection causes data fragmentation in monorepos #1256

@ACT900

Description

@ACT900

Summary

claude-mem uses basename(process.cwd()) to determine the project name for observations, sessions, and ChromaDB embeddings. In monorepo or multi-directory projects, this causes observations to be silently split across multiple project names, making them invisible to startup context injection and project-scoped queries.

Reproduction

  1. Start Claude Code from /home/user/projects/my-app → project = my-app
  2. During the session, a hook fires while CWD is /home/user/projects/my-app/apps/web (e.g., after cd apps/web && pnpm tsc) → project = web
  3. Observations from step 2 are tagged as project web, not my-app
  4. Next session started from /home/user/projects/my-app → startup context injection queries WHERE project = 'my-app' and misses all web observations

Impact on our system

We discovered 442 observations and 2,819 ChromaDB embeddings fragmented across wrong project names:

Project Observations Actual Source
my-app 12,233 ✅ Correct
web 395 ❌ Should be my-app (from apps/web/ subdirectory)
my-app-old 47 ❌ Should be my-app (old directory name for same project)

These observations were completely invisible to:

  • Startup context injection (session-start hook)
  • Any search using project parameter
  • The timeline view filtered by project

Root cause

In worker-service.cjs, the fp() function:

function fp(t) {
  if (!t || t.trim() === "") return "unknown-project";
  let e = path.basename(t);
  if (e === "") return "unknown-project";
  return e;
}

This is called during session-init with the CWD from hook input. There's a separate HC() function that does git rev-parse --show-toplevel, but it's not used for the project name stored on observations.

Suggested fixes (in order of preference)

1. Environment variable override (simplest, most flexible)

Add CLAUDE_MEM_PROJECT to settings.json. If set, always use it as the project name regardless of CWD.

{
  "CLAUDE_MEM_PROJECT": "my-app"
}

2. Git root detection for project name

Replace basename(cwd) with basename(git_root) when inside a git repository:

function fp(t) {
  if (!t || t.trim() === "") return "unknown-project";
  try {
    const gitRoot = execSync('git rev-parse --show-toplevel', { cwd: t, encoding: 'utf-8' }).trim();
    return path.basename(gitRoot);
  } catch {
    return path.basename(t) || "unknown-project";
  }
}

3. Sticky project name per session

Once a session is initialized with a project name, persist it for the entire session. Don't re-derive the project from CWD on subsequent hook calls.

Workaround

We manually merged the fragmented data:

-- SQLite
UPDATE observations SET project = 'my-app' WHERE project IN ('web', 'my-app-old');
UPDATE sdk_sessions SET project = 'my-app' WHERE project IN ('web', 'my-app-old');
UPDATE session_summaries SET project = 'my-app' WHERE project IN ('web', 'my-app-old');

-- ChromaDB (chroma.sqlite3)
UPDATE embedding_metadata SET string_value = 'my-app' 
WHERE key = 'project' AND string_value IN ('web', 'my-app-old');

This is not sustainable long-term as the split recurs whenever hooks fire from a subdirectory.

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions