Skip to content

Update datafusion specific jvmopts to config/jvm.options#6187

Merged
rishabh6788 merged 3 commits into
opensearch-project:mainfrom
peterzhuamazon:update-datafusion-jvmopts
May 6, 2026
Merged

Update datafusion specific jvmopts to config/jvm.options#6187
rishabh6788 merged 3 commits into
opensearch-project:mainfrom
peterzhuamazon:update-datafusion-jvmopts

Conversation

@peterzhuamazon
Copy link
Copy Markdown
Member

Description

Update datafusion specific jvmopts to config/jvm.options

Issues Resolved

#5810

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 0f62943.

PathLineSeverityDescription
scripts/components/OpenSearch-DataFusion/jvmopts.txt2medium--enable-native-access=ALL-UNNAMED grants native memory access to ALL unnamed modules, not only org.opensearch.nativebridge. The named module is also listed, making the ALL-UNNAMED grant broader than necessary and expanding the JVM attack surface for any unnamed code running in the same process.
scripts/components/OpenSearch-DataFusion/jvmopts.txt8medium--add-opens=java.base/java.nio=ALL-UNNAMED opens core JVM internals (java.nio) to every unnamed module in the JVM, allowing reflective access to normally encapsulated NIO internals from arbitrary code. Scoping this to only the required named modules (e.g., org.apache.arrow.memory.core) would be significantly safer.
scripts/components/OpenSearch-DataFusion/jvmopts.txt4lowThe combination of -Dio.netty.noUnsafe=false and -Dio.netty.tryUnsafe=true explicitly opts into Netty's sun.misc.Unsafe code paths. While standard for high-performance native memory use in Arrow/DataFusion, these flags disable a meaningful safety layer and should be documented as intentional.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 2 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Reviewer Guide 🔍

(Review updated until commit d95234c)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Idempotency Risk

The cat ./jvmopts.txt >> "$OUTPUT/config/jvm.options" command appends to the file every time the script runs. If the install script is executed multiple times, the JVM options will be duplicated in jvm.options, potentially causing configuration issues or JVM startup failures.

cat ./jvmopts.txt >> "$OUTPUT/config/jvm.options"
Debug Leftover

The ls "$OUTPUT/lib/rust" line appears to be a debug/diagnostic statement that was likely added for troubleshooting and may not be intended for production use. Consider removing it or replacing it with a more meaningful log message.

ls "$OUTPUT/lib/rust"
Security Concern

The option -Dio.netty.noUnsafe=false combined with -Dio.netty.tryUnsafe=true explicitly enables unsafe memory access in Netty. This should be validated as intentional and necessary, as it bypasses certain safety checks and could introduce security or stability risks.

-Dio.netty.noUnsafe=false
-Dio.netty.tryUnsafe=true

@peterzhuamazon peterzhuamazon added the skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. label May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Code Suggestions ✨

Latest suggestions up to d95234c

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Use script-relative path for robustness

The script uses a relative path ./jvmopts.txt which depends on the working directory
at runtime. If the script is executed from a different directory, the file won't be
found. Use the script's own directory as the base path instead.

scripts/components/OpenSearch-DataFusion/install.sh [127]

-cat ./jvmopts.txt >> "$OUTPUT/config/jvm.options"
+cat "$(dirname "$0")/jvmopts.txt" >> "$OUTPUT/config/jvm.options"
Suggestion importance[1-10]: 6

__

Why: Using $(dirname "$0") instead of ./ makes the script more robust when called from different working directories. This is a valid and common bash best practice.

Low
General
Validate target file exists before appending

There is no check that "$OUTPUT/config/jvm.options" exists before appending to it.
If the file or directory doesn't exist, the append will silently create a new file
or fail. Add a guard to verify the target file exists before appending.

scripts/components/OpenSearch-DataFusion/install.sh [127]

-cat ./jvmopts.txt >> "$OUTPUT/config/jvm.options"
+if [ ! -f "$OUTPUT/config/jvm.options" ]; then
+  echo "ERROR: $OUTPUT/config/jvm.options not found, cannot update JVM options."
+  exit 1
+fi
+cat "$(dirname "$0")/jvmopts.txt" >> "$OUTPUT/config/jvm.options"
Suggestion importance[1-10]: 4

__

Why: While checking for the existence of jvm.options before appending is a reasonable defensive measure, the >> operator in bash will create the file if it doesn't exist rather than silently failing, making the error scenario less critical. The guard adds safety but is not strictly necessary.

Low

Previous suggestions

Suggestions up to commit 0f62943
CategorySuggestion                                                                                                                                    Impact
Possible issue
Use script-relative path for robustness

The script uses a relative path ./jvmopts.txt which depends on the working directory
when the script is executed. If the script is run from a different directory, the
file won't be found. Use the script's directory as the base path instead.

scripts/components/OpenSearch-DataFusion/install.sh [127]

-cat ./jvmopts.txt >> "$OUTPUT/config/jvm.options"
+cat "$(dirname "$0")/jvmopts.txt" >> "$OUTPUT/config/jvm.options"
Suggestion importance[1-10]: 6

__

Why: Using ./jvmopts.txt is fragile if the script is called from a different working directory. Using $(dirname "$0")/jvmopts.txt ensures the path is always relative to the script's location, improving robustness.

Low
General
Validate target file exists before appending

The script does not verify that $OUTPUT/config/jvm.options exists before appending
to it. If the file or directory doesn't exist, the append operation will silently
create a new file or fail. Add a check to ensure the target file exists before
appending.

scripts/components/OpenSearch-DataFusion/install.sh [127]

-cat ./jvmopts.txt >> "$OUTPUT/config/jvm.options"
+if [ ! -f "$OUTPUT/config/jvm.options" ]; then
+  echo "ERROR: $OUTPUT/config/jvm.options not found, cannot append JVM options."
+  exit 1
+fi
+cat "$(dirname "$0")/jvmopts.txt" >> "$OUTPUT/config/jvm.options"
Suggestion importance[1-10]: 4

__

Why: While checking for the existence of $OUTPUT/config/jvm.options before appending adds safety, the >> operator in bash will create the file if it doesn't exist, so this is a minor defensive improvement rather than a critical fix. The suggestion also combines the path fix from suggestion 1, which conflates two concerns.

Low
Suggestions up to commit 0f62943
CategorySuggestion                                                                                                                                    Impact
Possible issue
Use script-relative path for robustness

The script uses a relative path ./jvmopts.txt which depends on the current working
directory when the script is executed. If the script is run from a different
directory, the file won't be found. Use the script's directory as the base path
instead.

scripts/components/OpenSearch-DataFusion/install.sh [127]

-cat ./jvmopts.txt >> "$OUTPUT/config/jvm.options"
+cat "$(dirname "$0")/jvmopts.txt" >> "$OUTPUT/config/jvm.options"
Suggestion importance[1-10]: 6

__

Why: Using ./jvmopts.txt is fragile if the script is called from a different working directory. Using $(dirname "$0")/jvmopts.txt makes the path relative to the script's location, which is more robust.

Low
General
Validate target file exists before appending

There is no check that "$OUTPUT/config/jvm.options" exists before appending to it.
If the file or directory doesn't exist, the append operation may silently create a
new file or fail. Add a guard to verify the target file exists before appending.

scripts/components/OpenSearch-DataFusion/install.sh [127]

-cat ./jvmopts.txt >> "$OUTPUT/config/jvm.options"
+if [ ! -f "$OUTPUT/config/jvm.options" ]; then
+  echo "ERROR: $OUTPUT/config/jvm.options not found, cannot append JVM options."
+  exit 1
+fi
+cat "$(dirname "$0")/jvmopts.txt" >> "$OUTPUT/config/jvm.options"
Suggestion importance[1-10]: 4

__

Why: While checking for the existence of $OUTPUT/config/jvm.options before appending adds safety, the >> operator in bash will create the file if it doesn't exist, so this is a minor defensive improvement rather than a critical fix. The improved_code also incorporates the path fix from suggestion 1, mixing two concerns.

Low

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit 0f62943

@peterzhuamazon
Copy link
Copy Markdown
Member Author

Still verifying with Rishabh on the amount to be added for datafusion only.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.62%. Comparing base (b1690e3) to head (d95234c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6187   +/-   ##
=======================================
  Coverage   96.62%   96.62%           
=======================================
  Files         405      405           
  Lines       19039    19039           
=======================================
  Hits        18397    18397           
  Misses        642      642           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit d95234c

@rishabh6788 rishabh6788 merged commit 7b64cfd into opensearch-project:main May 6, 2026
17 checks passed
@github-project-automation github-project-automation Bot moved this from 👀 In Review to ✅ Done in Engineering Effectiveness Board May 6, 2026
@peterzhuamazon peterzhuamazon deleted the update-datafusion-jvmopts branch May 6, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Build Libraries & Interfaces cicd distinguished-contributor enhancement New Enhancement integtest skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. test

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

2 participants