Skip to content

feat(integrations): Add GitHub repository platform detection#109699

Merged
jaydgoss merged 10 commits intomasterfrom
jaygoss/vdy-15-platform-detection-core
Mar 12, 2026
Merged

feat(integrations): Add GitHub repository platform detection#109699
jaydgoss merged 10 commits intomasterfrom
jaygoss/vdy-15-platform-detection-core

Conversation

@jaydgoss
Copy link
Member

@jaydgoss jaydgoss commented Mar 2, 2026

Summary

  • Add get_languages() method to GitHub API client to fetch repository language statistics
  • Create platform_detection module with language-to-platform mapping, framework detection from manifest files (package.json, requirements.txt, pyproject.toml, Pipfile, Gemfile, composer.json, build.gradle, pom.xml, go.mod), and confidence scoring
  • Add REST endpoint GET /api/0/organizations/{org}/repos/{repo_id}/platforms/ to expose detected platforms
  • 18 base language mappings, 23 framework detection rules, 24 ignored languages

This is the foundation for automatic platform detection from GitHub repositories to streamline onboarding. Part 1 of 3.

Stack:

  • PR 1 (this): Core detection + API endpoint
  • PR 2: Composable framework definitions refactor
  • PR 3: Expanded coverage (~80 platforms)

Test plan

  • 44 unit tests covering language mapping, manifest parsing, framework detection, supersession, and edge cases
  • 7 integration tests covering endpoint success/error cases, IDOR prevention, and auth requirements

@jaydgoss jaydgoss requested review from a team as code owners March 2, 2026 19:39
@linear
Copy link

linear bot commented Mar 2, 2026

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 2, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

🚨 Warning: This pull request contains Frontend and Backend changes!

It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently.

Have questions? Please ask in the #discuss-dev-infra channel.

@jaydgoss jaydgoss marked this pull request as draft March 2, 2026 20:14
@jaydgoss jaydgoss force-pushed the jaygoss/vdy-15-platform-detection-core branch from 513ed5d to 41d47b7 Compare March 2, 2026 23:11
@jaydgoss jaydgoss force-pushed the jaygoss/vdy-15-platform-detection-core branch from 4b8dc2a to 0d660b0 Compare March 3, 2026 17:48
else:
# Text-based manifest files: requirements.txt, Gemfile,
# pyproject.toml, build.gradle, pom.xml, go.mod
content_lower = content.lower()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This substring matching is loose for text-based manifests. For example, "echo" in the Go dependency map will match anywhere the word appears in go.mod, not just as a module path.

This is addressed in PR 3 of this stack (#109701), which replaces the text-based detection with the composable FrameworkDef system and uses full Go module paths like github.com/labstack/echo.

detectors = FRAMEWORK_DETECTORS.get(base_platform, [])
detected: list[str] = []

for manifest_file, dependency_map in detectors:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This loop makes a separate GitHub API call per manifest file per language (_get_repo_file_content inside the loop). For a repo with Python + JavaScript, that's up to 4 manifest fetches (requirements.txt, pyproject.toml, Pipfile, package.json).

PR 2 in this stack (#109700) eliminates this by fetching the root directory listing in a single API call, collecting all needed file paths upfront, and batch-fetching content.

@jaydgoss jaydgoss marked this pull request as ready for review March 4, 2026 16:20
@jaydgoss jaydgoss requested a review from a team March 4, 2026 16:21
jaydgoss added a commit that referenced this pull request Mar 9, 2026
Register `organizations:integrations-github-platform-detection` flagpole
flag to gate the new platform detection endpoint behind a controlled
rollout.

This is the base of a 4-PR stack for GitHub platform detection:
- **PR 0 (this):** Feature flag registration
- [PR 1](#109699): Core
detection endpoint
- [PR 2](#109700): Composable
framework definitions
- [PR 3](#109701): Expand to 97%
picker coverage

Co-authored-by: Claude <noreply@anthropic.com>
Base automatically changed from feat/platform-detection-feature-flag to master March 9, 2026 23:28
@jaydgoss jaydgoss force-pushed the jaygoss/vdy-15-platform-detection-core branch from 3e914eb to 4b5fa36 Compare March 9, 2026 23:41
Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

Backend Test Failures

Failures on 5b7ace0 in this run:

tests/sentry/profiles/test_task.py::DeobfuscationViaSymbolicator::test_basic_resolvinglog
tests/sentry/profiles/test_task.py:627: in test_basic_resolving
    assert android_profile["profile"]["methods"] == [
E   AssertionError: assert [{'class_name...oolean', ...}] == [{'class_name...oolean', ...}]
E     
E     At index 0 diff: {'class_name': 'org.slf4j.helpers.Util$ClassContextSecurityManager', 'name': 'getClassContext', 'signature': '()', 'source_file': 'Util.java', 'source_line': 67, 'data': {'deobfuscation_status': 'deobfuscated'}} != {'data': {'deobfuscation_status': 'deobfuscated'}, 'name': 'getClassContext', 'class_name': 'org.slf4j.helpers.Util$ClassContextSecurityManager', 'signature': '()', 'source_file': 'Something.java', 'source_line': 67}
E     
E     Full diff:
E       [
E           {
E               'class_name': 'org.slf4j.helpers.Util$ClassContextSecurityManager',
E               'data': {
E                   'deobfuscation_status': 'deobfuscated',
E               },
E               'name': 'getClassContext',
E               'signature': '()',
E     -         'source_file': 'Something.java',
E     ?                         ^^^^ - ^^
E     +         'source_file': 'Util.java',
E     ?                         ^  ^
E               'source_line': 67,
E           },
E           {
E               'class_name': 'org.slf4j.helpers.Util$ClassContextSecurityManager',
E               'data': {
E                   'deobfuscation_status': 'deobfuscated',
E               },
E               'name': 'getExtraClassContext',
E               'signature': '(): boolean',
E     -         'source_file': 'Else.java',
E     ?                         ^ --
E     +         'source_file': 'Util.java',
E     ?                         ^^^
E               'source_line': 69,
E           },
E       ]
tests/sentry/profiles/test_task.py::DeobfuscationViaSymbolicator::test_inline_resolvinglog
tests/sentry/profiles/test_task.py:683: in test_inline_resolving
    assert android_profile["profile"]["methods"] == [
E   AssertionError: assert [{'class_name...andler', ...}] == [{'class_name...andler', ...}]
E     
E     At index 0 diff: {'class_name': 'io.sentry.sample.-$$Lambda$r3Avcbztes2hicEObh02jjhQqd4', 'name': 'onClick', 'signature': '()', 'source_file': '-.java', 'source_line': 2, 'data': {'deobfuscation_status': 'deobfuscated'}} != {'class_name': 'io.sentry.sample.-$$Lambda$r3Avcbztes2hicEObh02jjhQqd4', 'data': {'deobfuscation_status': 'deobfuscated'}, 'name': 'onClick', 'signature': '()', 'source_file': None, 'source_line': 2}
E     
E     Full diff:
E       [
E           {
E               'class_name': 'io.sentry.sample.-$$Lambda$r3Avcbztes2hicEObh02jjhQqd4',
E               'data': {
E                   'deobfuscation_status': 'deobfuscated',
E               },
E               'name': 'onClick',
E               'signature': '()',
E     -         'source_file': None,
E     ?                        ^^^^
E     +         'source_file': '-.java',
E     ?                        ^^^^^^^^
E               'source_line': 2,
E           },
E           {
E               'class_name': 'io.sentry.sample.MainActivity',
E               'data': {
E                   'deobfuscation_status': 'deobfuscated',
E               },
E               'inline_frames': [
E                   {
E                       'class_name': 'io.sentry.sample.MainActivity',
E                       'data': {
E                           'deobfuscation_status': 'deobfuscated',
E                       },
E                       'name': 'onClickHandler',
E                       'signature': '()',
E                       'source_file': 'MainActivity.java',
E                       'source_line': 40,
E                   },
E                   {
E                       'class_name': 'io.sentry.sample.MainActivity',
E                       'data': {
E                           'deobfuscation_status': 'deobfuscated',
E                       },
E                       'name': 'foo',
E                       'signature': '()',
E                       'source_file': 'MainActivity.java',
E                       'source_line': 44,
E                   },
E                   {
E                       'class_name': 'io.sentry.sample.MainActivity',
E                       'data': {
... (14 more lines)

jaydgoss and others added 10 commits March 10, 2026 16:36
Add a platform detection pipeline that maps GitHub repository languages
to Sentry platform IDs for onboarding. Given a repo, the system calls
GitHub's Languages API, maps results to Sentry platforms, and refines
via manifest file inspection (package.json, requirements.txt, etc.)
to detect specific frameworks like Django, Next.js, or Rails.

- Add get_languages() to GitHubBaseClient
- Create platform_detection module with language mapping, framework
  detection, and main detect_platforms() orchestrator
- Add GET /api/0/organizations/{org}/repos/{repo_id}/platforms/ endpoint
- 51 tests covering unit logic and API integration

Refs VDY-15
Co-Authored-By: Claude <noreply@anthropic.com>
…tent

Catch KeyError (missing "content" key), ValueError (invalid base64 via
binascii.Error), and UnicodeDecodeError (binary file content) in addition
to ApiError. These can occur when the GitHub API returns unexpected
response shapes or binary file content.

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
…point base class

Move repository resolution from the platforms endpoint into a reusable
base class. Use IntegrationProviderSlug.GITHUB constant instead of
hardcoded "github" string for provider comparison.

Co-Authored-By: Claude <noreply@anthropic.com>
The substring check `IntegrationProviderSlug.GITHUB not in repo.provider`
incorrectly allowed GitHub Enterprise repos through since "github" is a
substring of "integrations:github_enterprise". Switch to exact equality.
Also assert full response shapes in endpoint tests.

Co-Authored-By: Claude <noreply@anthropic.com>
Check organizations:integrations-github-platform-detection before
serving platform detection results. Returns 404 when the flag is
disabled to hide the endpoint from orgs not in the rollout.

Co-Authored-By: Claude <noreply@anthropic.com>
GitHub contents API returns a JSON array (not dict) when a path
resolves to a directory. Subscripting a list with a string key raises
TypeError, which was not caught, causing a 500 instead of graceful
fallback to None.

Co-Authored-By: Claude <noreply@anthropic.com>
@jaydgoss jaydgoss merged commit cdb1a1c into master Mar 12, 2026
61 checks passed
@jaydgoss jaydgoss deleted the jaygoss/vdy-15-platform-detection-core branch March 12, 2026 16:06
jaydgoss added a commit that referenced this pull request Mar 12, 2026
…k definitions (#109700)

## Summary

- Refactor flat framework detection into composable `FrameworkDef` /
`DetectorRule` system
- Add three signal types: `path` (config file existence),
`match_content` (regex on file content), `match_package` (dependency
lookup in parsed manifests)
- Add `every` (AND) and `some` (OR) rule composition
- Add priority ranking (`sort` field) and supersession (e.g. Next.js
supersedes React)
- Refactor file content fetching into a single batch pass to minimize
API calls

No behavior change for existing detections, but the architecture now
supports easy addition of new frameworks as data-only entries.

**Stack:**
- [PR 1](#109699): Core
detection + API endpoint
- **PR 2 (this):** Composable framework definitions refactor
- [PR 3](#109701): Expanded
coverage (98% of picker platforms)

---------

Co-authored-by: Claude <noreply@anthropic.com>
jaydgoss added a commit that referenced this pull request Mar 12, 2026
…109701)

Expand the composable framework detection system to cover 97/100 (97%)
of selectable platforms in the picker, up from the ~36 platforms in PR
2.

**Infrastructure additions:**
- `match_ext` + `match_content` combo rules -- find files by extension,
then search content (needed for .csproj inspection where filenames vary)
- `_NON_SELECTABLE_PLATFORMS` filter -- platforms detected internally
for ranking (e.g. preventing WordPress from being misidentified as
Symfony) but not shown to users because they lack onboarding docs or map
to other platforms (perl, php-wordpress, swift)
- Dual `base_platform` entries for Android (java + kotlin) so
Kotlin-first projects are correctly detected

**61 new framework definitions:**
- **JavaScript (22):** astro, gatsby, sveltekit, solidstart, solid,
ember, tanstackstart-react, react-router, react-native, electron,
capacitor, ionic, cordova, node, nestjs, fastify, connect, hapi,
awslambda, gcpfunctions, azurefunctions, cloudflare-workers,
cloudflare-pages
- **Python (13):** aiohttp, bottle, falcon, pyramid, quart, sanic,
tryton, chalice, asgi, wsgi, awslambda, gcpfunctions, rq
- **Go (4):** fasthttp, iris, negroni
- **Java (2):** log4j2, logback
- **Ruby (1):** rack
- **PHP (2):** wordpress (non-selectable), symfony
- **Dart (1):** flutter
- **Swift (1):** apple-macos
- **Native (1):** native-qt
- **.NET (7):** maui, wpf, winforms, xamarin, aspnet, awslambda,
gcpfunctions
- **Mobile/Gaming (5):** unity, android, dotnet-aspnetcore, unreal,
godot
- **Other (3):** bun, deno, PowerShell (base platform)

**Only 3 platforms remain undetectable:** `go-http` (stdlib net/http, no
manifest signal), `minidump` (crash dump format, not a project type),
and `python-serverless` (too generic, overlaps with
awslambda/gcpfunctions).

**Note on `go-http`:** Plain Go repos now intentionally return `go` as
the base platform instead of `go-http`. There is no reliable way to
distinguish a net/http project from any other Go project without a
framework dependency, so the generic `go` platform is the correct
fallback.

**Stack:**
- [PR 1](#109699): Core
detection + API endpoint
- [PR 2](#109700): Composable
framework definitions refactor
- **PR 3 (this):** Expanded coverage (97% of picker platforms)

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components Scope: Frontend Automatically applied to PRs that change frontend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants