Skip to content

Improve AI release notes generation for OpenSearch#6055

Merged
gaiksaya merged 1 commit intoopensearch-project:mainfrom
andrross:release-notes-update
Mar 30, 2026
Merged

Improve AI release notes generation for OpenSearch#6055
gaiksaya merged 1 commit intoopensearch-project:mainfrom
andrross:release-notes-update

Conversation

@andrross
Copy link
Copy Markdown
Member

  • Fetch PR body from GitHub API and include it (truncated to 2000 chars) in the commit data sent to the LLM for better categorization and filtering decisions.

  • Add OpenSearch-specific prompt that filters non-user-facing changes (test additions, CI action bumps, test fixture deps, internal refactoring, release machinery) before categorization. Removes Infrastructure/Documentation/Refactoring categories that contradicted the exclusion policy. Only Breaking Changes, Features, Enhancements, Bug Fixes, and Maintenance categories are used.

  • Add borderline calls section: the LLM appends an HTML comment documenting debatable inclusion/exclusion/categorization decisions. This is extracted from the LLM output and written to a separate -borderline.md file.

  • Update Jenkinsfile to include borderline calls in the PR description when creating or updating release notes PRs. Also fix the existing PR path to update the PR body via gh pr edit.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@andrross andrross force-pushed the release-notes-update branch from 14ebe61 to 8257f38 Compare March 27, 2026 23:51
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.61%. Comparing base (2ec4925) to head (22600f1).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6055      +/-   ##
==========================================
+ Coverage   96.58%   96.61%   +0.02%     
==========================================
  Files         405      405              
  Lines       18758    18851      +93     
==========================================
+ Hits        18118    18212      +94     
+ Misses        640      639       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andrross andrross force-pushed the release-notes-update branch from 8257f38 to 0409941 Compare March 28, 2026 00:15
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@andrross andrross force-pushed the release-notes-update branch from 0409941 to eb682cd Compare March 28, 2026 01:29
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 28, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 0d5720e.

PathLineSeverityDescription
src/release_notes_workflow/release_notes.py155mediumPR body content fetched from external GitHub API (arbitrary user-controlled text) is passed directly into LLM prompts without sanitization. A malicious PR author could craft a PR description containing prompt injection payloads to manipulate LLM output, potentially causing the release notes generator to emit unexpected or misleading content into official release documentation.
jenkins/release-workflows/release-notes-generate.jenkinsfile163lowLLM-generated borderline calls content is read from a local file via `$(cat "$BORDERLINE_FILE")` and embedded directly into the GitHub PR body without sanitization. While the content is produced by an internal LLM, if the LLM were manipulated (e.g., via prompt injection from PR bodies), the resulting file content could include markdown or HTML that affects PR rendering in unexpected ways. Low severity as the content is passed as a string argument to `gh pr`, not executed as shell.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 1 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

Copy link
Copy Markdown
Member

@gaiksaya gaiksaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall with minor comments. If this works for opensearch maybe we can extend to other components as well.
Adding @rishabh6788 for another review.

Comment thread src/llms/prompts.py
Comment on lines +162 to +166
* "breaking change" or "breaking" → Breaking Changes
* "feature" or "feat" → Features
* "enhancement" or "improve" → Enhancements
* "bug" or "fix" or "bugfix" → Bug Fixes
* "maintenance" or "version" or "support" → Maintenance
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The categories also include infrastructure and documentation changes as per guidelines.
https://github.com/opensearch-project/opensearch-plugins/blob/main/RELEASE_NOTES.md#categories
I agree they do no provide any value. @peterzhuamazon maybe we also need to exclude them from consolidated ones.

Comment thread src/llms/prompts.py
Comment thread src/llms/prompts.py Outdated
- Example: `([#456](https://github.com/opensearch-project/OpenSearch/pull/456))`

6. **Important Notes:**
- If you cannot determine the appropriate category from labels OR content analysis, place the entry in "Maintenance"
Copy link
Copy Markdown
Member

@gaiksaya gaiksaya Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the consolidation workflow where we combine all non-core components' release notes into one, we place the uncategorized data in non-compliant section? Would doing the same here avoid confusing between actual maintenance?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably a good idea.

The reason I hesitated is that anytime the LLM creates an "unknown" or "non-compliant" section it means manual work will be required. Probably not the end of the world if it truly has no idea how to categorize.

@andrross andrross force-pushed the release-notes-update branch from eb682cd to 2bc6cd2 Compare March 30, 2026 19:03
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@andrross andrross force-pushed the release-notes-update branch from 2bc6cd2 to 0d5720e Compare March 30, 2026 19:49
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

Comment thread src/llms/prompts.py Outdated
* Maintenance: Routine upkeep such as dependency updates that ship in the distribution.
- Do not use "Infrastructure", "Documentation", or "Refactoring" as categories. Changes that would
belong to those categories should have been excluded by the filtering step. If a PR survived filtering
but does not fit any of the above categories, place it in "Maintenance".
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
but does not fit any of the above categories, place it in "Maintenance".
but does not fit any of the above categories, place it in "Unknown".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think I should strike the last sentence. I find it confusing given the instruction at the end.

- Fetch PR body from GitHub API and include it (truncated to 2000 chars)
  in the commit data sent to the LLM for better categorization and
  filtering decisions.

- Add OpenSearch-specific prompt that filters non-user-facing changes
  (test additions, CI action bumps, test fixture deps, internal
  refactoring, release machinery) before categorization. Removes
  Infrastructure/Documentation/Refactoring categories that contradicted
  the exclusion policy. Only Breaking Changes, Features, Enhancements,
  Bug Fixes, and Maintenance categories are used.

- Add borderline calls section: the LLM appends an HTML comment
  documenting debatable inclusion/exclusion/categorization decisions.
  This is extracted from the LLM output and written to a separate
  -borderline.md file.

- Update Jenkinsfile to include borderline calls in the PR description
  when creating or updating release notes PRs. Also fix the existing PR
  path to update the PR body via gh pr edit.

Signed-off-by: Andrew Ross <andrross@amazon.com>
@andrross andrross force-pushed the release-notes-update branch from 0d5720e to 22600f1 Compare March 30, 2026 21:17
@github-actions
Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Fetch and expose PR body in commit data

Relevant files:

  • src/git/git_commit_processor.py
  • tests/tests_git/test_git_commit_processor.py

Sub-PR theme: OpenSearch-specific prompt and borderline calls extraction

Relevant files:

  • src/llms/prompts.py
  • src/release_notes_workflow/release_notes.py
  • tests/tests_release_notes_workflow/test_release_notes.py

Sub-PR theme: Jenkinsfile update to include borderline calls in PR description

Relevant files:

  • jenkins/release-workflows/release-notes-generate.jenkinsfile
  • tests/jenkins/jenkinsjob-regression-files/release-workflows/release-notes-generate.jenkinsfile.txt

⚡ Recommended focus areas for review

Hardcoded Component Name

The OpenSearch-specific prompt is selected by checking component.name == 'OpenSearch' (exact string match). This is fragile — if the component name differs in casing or has a suffix, the wrong prompt will be used silently. Consider making this configurable or using a case-insensitive comparison.

commit_prompt = AI_RELEASE_NOTES_PROMPT_COMMIT_OPENSEARCH if component.name == 'OpenSearch' else AI_RELEASE_NOTES_PROMPT_COMMIT
Borderline File Naming

The borderline filename is derived by replacing .md with -borderline.md in the original filename. If the filename contains .md in a path component or multiple .md occurrences, str.replace will replace the first occurrence, potentially producing an unexpected path. Using removesuffix or os.path.splitext would be safer.

borderline_filename = filename.replace('.md', '-borderline.md')
with open(os.path.join(os.getcwd(), 'release-notes', borderline_filename), 'w') as f:
PR Body Injection Risk

The PR_BODY variable is constructed by appending the raw content of the borderline file (via $(cat "$BORDERLINE_FILE")), which is LLM-generated text. This content is then passed directly to gh pr edit --body "$PR_BODY" and gh pr create --body "$PR_BODY". If the LLM output contains shell metacharacters or special sequences, this could cause unexpected behavior. Consider writing the body to a temp file and using --body-file instead.

                                                                            BORDERLINE_FILE="$WORKSPACE/release-notes/${filename.replace('.md', '-borderline.md')}"
                                                                            PR_BODY="Add release notes for ${version}"
                                                                            if [ -f "\$BORDERLINE_FILE" ]; then
                                                                                PR_BODY="\${PR_BODY}

## Borderline Calls
\$(cat "\$BORDERLINE_FILE")"
                                                                            fi
                                                                            EXISTING_PR=\$(gh pr list --head release-chores/release-notes-${version} --state open --json number --jq '.[0].number')
                                                                            if [ -n "\$EXISTING_PR" ]; then
                                                                                echo "Updating existing PR #\${EXISTING_PR} for component \${REPO_NAME} with new release notes."
                                                                                gh pr edit "\$EXISTING_PR" --body "\$PR_BODY"
                                                                            else
                                                                                gh pr create --title '[AUTO] Add release notes for ${version}' --body "\$PR_BODY" -H release-chores/release-notes-${version} -B ${ref}
Inconsistent Filtering Rule

The new OpenSearch prompt (step 1) says "When uncertain whether a change is user-facing, include it", but step 2 says "Do not use Infrastructure, Documentation, or Refactoring as categories. Changes that would belong to those categories should have been excluded by the filtering step." This creates a contradiction: uncertain changes are included but then have no valid category to land in, potentially forcing them into "Unknown". The instructions should clarify what category to use for borderline infrastructure/docs/refactoring changes that are included.

- Do not use "Infrastructure", "Documentation", or "Refactoring" as categories. Changes that would
  belong to those categories should have been excluded by the filtering step.

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Use case-insensitive component name comparison

The component name comparison component.name == 'OpenSearch' is case-sensitive and
fragile. If the component name is stored as opensearch, OPENSEARCH, or any other
casing variant, the wrong prompt will be used silently. Use a case-insensitive
comparison to be safe.

src/release_notes_workflow/release_notes.py [159]

-commit_prompt = AI_RELEASE_NOTES_PROMPT_COMMIT_OPENSEARCH if component.name == 'OpenSearch' else AI_RELEASE_NOTES_PROMPT_COMMIT
+commit_prompt = AI_RELEASE_NOTES_PROMPT_COMMIT_OPENSEARCH if component.name.lower() == 'opensearch' else AI_RELEASE_NOTES_PROMPT_COMMIT
Suggestion importance[1-10]: 5

__

Why: The case-sensitive comparison component.name == 'OpenSearch' could silently use the wrong prompt if the component name casing differs. A .lower() comparison is a simple and effective fix for this potential issue.

Low
General
Make borderline calls regex more robust

The regex pattern requires a newline immediately after BORDERLINE_CALLS, but the LLM
might output the comment with or without trailing whitespace before the newline.
More critically, the closing --> could appear with leading whitespace or on the same
line as content. Consider making the pattern more robust to handle variations in
whitespace around the closing delimiter.

src/release_notes_workflow/release_notes.py [37]

-match = re.search(r'<!-- BORDERLINE_CALLS\s*\n(.*?)-->', raw, re.DOTALL)
+match = re.search(r'<!--\s*BORDERLINE_CALLS\s*\n(.*?)\s*-->', raw, re.DOTALL)
Suggestion importance[1-10]: 4

__

Why: The suggestion improves robustness by allowing optional whitespace before BORDERLINE_CALLS, but the current pattern already handles trailing whitespace after it with \s*\n. The improvement is minor since the LLM output format is controlled by the prompt.

Low
Ensure test exercises the correct code path

The test test_generate_writes_borderline_file relies on mock_isfile.return_value =
True to trigger the changelog path, but the generate method also uses
get_commits_with_labels for the commit path. Without mocking the commits processor
and its return value, the test may not exercise the code path that writes the
borderline file, causing the assertion len(write_calls) == 2 to fail. Ensure the
mock setup covers the actual code path being tested.

tests/tests_release_notes_workflow/test_release_notes.py [361-362]

+# The test should mock the commits processor to ensure the commit-based path is exercised,
+# or set mock_isfile.return_value = False and mock the commits processor with borderline content.
+# Alternatively, verify the test actually reaches the release_notes_raw assignment by asserting
+# mock_ai_generator.generate_release_notes was called before checking write_calls.
+mock_ai_generator.generate_release_notes.assert_called_once()
 write_calls = [c for c in mock_file_open.call_args_list if len(c[0]) > 1 and c[0][1] == 'w']
 self.assertEqual(len(write_calls), 2)
Suggestion importance[1-10]: 3

__

Why: The suggestion raises a valid concern about test setup, but the improved_code doesn't actually fix the mock setup issue — it just adds an assertion before the existing check. The core concern about whether the code path is properly exercised requires more substantial test changes.

Low
Safely construct borderline filename with splitext

If filename does not end with .md (e.g., it has no extension or a different
extension), filename.replace('.md', '-borderline.md') will silently produce an
incorrect filename or fail to replace anything, resulting in a borderline file with
the same name as the release notes file. Use a more reliable method to construct the
borderline filename.

src/release_notes_workflow/release_notes.py [174-177]

 if borderline_calls:
-    borderline_filename = filename.replace('.md', '-borderline.md')
+    base, ext = os.path.splitext(filename)
+    borderline_filename = f"{base}-borderline{ext or '.md'}"
     with open(os.path.join(os.getcwd(), 'release-notes', borderline_filename), 'w') as f:
         f.write(borderline_calls)
Suggestion importance[1-10]: 3

__

Why: While using os.path.splitext is more robust, in practice the filename variable is always expected to end with .md based on the codebase conventions, making this an edge case with minimal practical impact.

Low

@gaiksaya gaiksaya merged commit 5b70446 into opensearch-project:main Mar 30, 2026
17 checks passed
@github-project-automation github-project-automation Bot moved this from 👀 In Review to ✅ Done in Engineering Effectiveness Board Mar 30, 2026
@andrross andrross deleted the release-notes-update branch March 30, 2026 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

3 participants