Skip to content

ref(seer): Add random 50% rollout for context engine in start_run#110574

Merged
Mihir-Mavalankar merged 1 commit intomasterfrom
mihir/ref/context-engine-random-rollout
Mar 12, 2026
Merged

ref(seer): Add random 50% rollout for context engine in start_run#110574
Mihir-Mavalankar merged 1 commit intomasterfrom
mihir/ref/context-engine-random-rollout

Conversation

@Mihir-Mavalankar
Copy link
Contributor

@Mihir-Mavalankar Mihir-Mavalankar commented Mar 12, 2026

  • Gate context engine enablement behind a random coin flip for runs in sentry org with the feature flag.
  • contuinue_run always just checks the feature flag since in seer for continue runs we have the condition current_state.is_context_engine_enabled and self.request.is_context_engine_enabled:. So if the start run sets the context flag to True this condition is always true other wise false.
  • Currently set to 0. Will set to 0.5 in options automator.
  • Options automator PR needs to be merged first though: https://github.com/getsentry/sentry-options-automator/pull/6797

@Mihir-Mavalankar Mihir-Mavalankar self-assigned this Mar 12, 2026
@Mihir-Mavalankar Mihir-Mavalankar requested a review from a team as a code owner March 12, 2026 20:15
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 12, 2026
Copy link
Member

@JoshFerge JoshFerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the idea behind the random rollout?

Copy link
Member

@shruthilayaj shruthilayaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we remove self.actor from the flag check?

@Mihir-Mavalankar
Copy link
Contributor Author

what's the idea behind the random rollout?

  • My idea is that users in sentry should use explorer as is without knowing if that particular chat had context engine or not. This way they don't bias the results we collect in any way. For example asking harder questions if they know context engine is on.
  • From your suggestion in the meeting, Shruthi and I decided that we will make a frontend feature flag toggle just for our team. So just our team can run experiments and manually toggle it on and off. That PR is coming soon but needs more work cuz of the frontend component.

@Mihir-Mavalankar
Copy link
Contributor Author

why did we remove self.actor from the flag check?

Since the feature flag check is only org bound now, we can skip actor. Keeping it in won't break anything but is not needed.

@shruthilayaj
Copy link
Member

why did we remove self.actor from the flag check?

Since the feature flag check is only org bound now, we can skip actor. Keeping it in won't break anything but is not needed.

I've disabled the flag for myself when testing, but I guess it's fine if we have the override 🤷‍♀️

Gate context engine enablement behind a configurable rollout rate
in start_run for orgs with the feature flag. The rate is controlled
by the seer.explorer.context-engine-rollout option (default 0.0).
continue_run always passes True since Seer ANDs it with the persisted
value from start_run.

Co-Authored-By: Claude Sonnet 4 <noreply@example.com>
@JoshFerge
Copy link
Member

My idea is that users in sentry should use explorer as is without knowing if that particular chat had context engine or not. This way they don't bias the results we collect in any way. For example asking harder questions if they know context engine is on.

do we have enough internal usage to create statistically significant findings from this? why can't we just have evals for this instead?

@Mihir-Mavalankar Mihir-Mavalankar merged commit 9c2ca2c into master Mar 12, 2026
59 checks passed
@Mihir-Mavalankar Mihir-Mavalankar deleted the mihir/ref/context-engine-random-rollout branch March 12, 2026 21:03
@Mihir-Mavalankar
Copy link
Contributor Author

My idea is that users in sentry should use explorer as is without knowing if that particular chat had context engine or not. This way they don't bias the results we collect in any way. For example asking harder questions if they know context engine is on.

do we have enough internal usage to create statistically significant findings from this? why can't we just have evals for this instead?

  • We do have evals some evals here just for the context engine. These are the ones Shruthi has added and we do plan to add more. Evals have their limitations too though and I think they are mostly to just catch glaring regressions.
  • While just sample size of just sentry org is small it still more than the eval dataset size. I am also hoping to roll this to the early adopter orgs (with the random rollout) and I think then we might have sizable enough dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants