Skip to content
#

interpretability

Here are 14 public repositories matching this topic...

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

  • Updated Feb 6, 2026
  • TypeScript

This repository represents the transition from behavioral safety to Neural Forensics. It provides the infrastructure to detect, audit, and mitigate high-order AI risks—such as Latent Deception, Sycophancy-Masking, and Synthetic Intimacy—directly at the mechanistic activation layer.

  • Updated Jan 12, 2026
  • TypeScript
putman-visual-sim

Deterministic visual proof-of-concept for the PUTMAN Model: graph activation, rigidity pruning, beam reconstruction, recursive updates, and shift metric (Δ).

  • Updated Feb 23, 2026
  • TypeScript

Improve this page

Add a description, image, and links to the interpretability topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the interpretability topic, visit your repo's landing page and select "manage topics."

Learn more