Skip to content

[POC] [Security Manager Replacement] Native Java Agent (dynamic code rewriting, must be low overhead)#16731

Closed
reta wants to merge 1 commit intoopensearch-project:mainfrom
reta:issue-16633
Closed

[POC] [Security Manager Replacement] Native Java Agent (dynamic code rewriting, must be low overhead)#16731
reta wants to merge 1 commit intoopensearch-project:mainfrom
reta:issue-16633

Conversation

@reta
Copy link
Copy Markdown
Contributor

@reta reta commented Nov 27, 2024

Description

Explore the the native Java Agent (dynamic code rewriting, must be low overhead).

How does it work:

  • the application (OpenSearch) and agent use common module bootstrap
  • the application (OpenSearch) is run with the agent
  • the application (OpenSearch) uses bootstrap module apply security policies

Example:

The sample security.policy (stays the same as before):

grant codeBase "${codebase.opensearch-core}" {
   permission  java.net.SocketPermission "localhost", "connect";
};

The application (OpenSearch) is run with the agent:

-javaagent:agent-3.0.0-SNAPSHOT.jar

The application (OpenSearch) is applies security policy to the agent:

final Policy policy =  new PolicyFile("/security.policy");
AgentPolicy.setPolicy(policy);

Running with 24-ea+31-3600:

[2025-01-22T11:58:11,913][INFO ][o.o.n.Node               ] [host] version[3.0.0-SNAPSHOT], pid[101497], build[tar/7cf6a66e74d8352cf42d60c50b97e46a2aa8866c/2025-01-21T18:11:00.731851515Z], OS[Linux/6.11.0-13-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/24-ea/24-ea+31-3600]                             
[2025-01-22T11:58:11,916][INFO ][o.o.n.Node               ] [host] JVM home [/home/user/jdk-24], using bundled JDK/JRE [false]
[2025-01-22T11:58:11,916][INFO ][o.o.n.Node               ] [host] JVM arguments [-Xshare:auto, -Dopensearch.networkaddress.cache.ttl=60, -Dopensearch.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:
+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,CLDR, -Xms1g, -Xmx1g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:In
itiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/tmp/opensearch-12632241661790883371, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, --add-modules=jdk.incubator.vector, -Djava.util.concurrent.ForkJoin
Pool.common.threadFactory=org.opensearch.secure_sm.SecuredForkJoinWorkerThreadFactory, -javaagent:agent/opensearch-agent-3.0.0-SNAPSHOT.jar, -XX:MaxDirectMemorySize=536870912, -Dopensearch.path.home=/home/user/opensearch-3.0.0-jdk24, -Dopensearch.path.conf=/home/user/opensearch-3.0.0-jdk24/config, -Do
pensearch.distribution.type=tar, -Dopensearch.bundled_jdk=true]                                                                                                                                                                                                                                                                                       
[2025-01-22T11:58:11,916][WARN ][o.o.n.Node               ] [host] version [3.0.0-SNAPSHOT] is a pre-release version of OpenSearch and is not suitable for production                              
[2025-01-22T11:58:11,967][WARN ][o.a.l.i.v.VectorizationProvider] [host] You are running with Java 23 or later. To make full use of the Vector API, please update Apache Lucene.                                                                                                                                                 
[2025-01-22T11:58:12,347][INFO ][o.o.i.r.ReindexModulePlugin] [host] ReindexPlugin reloadSPI called                                                                                                                                                                                                                              
[2025-01-22T11:58:12,348][INFO ][o.o.i.r.ReindexModulePlugin] [host] Unable to find any implementation for RemoteReindexExtension


Related Issues

Closes #16633

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added the enhancement Enhancement or improvement to existing feature or request label Nov 27, 2024
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 6b73ddf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kumargu
Copy link
Copy Markdown
Contributor

kumargu commented Nov 27, 2024

thanks @reta this is really interesting and such a quick progress.

On a side note, it would be useful to add a small intro snippet how the agent would work overall.

@reta
Copy link
Copy Markdown
Contributor Author

reta commented Nov 27, 2024

thanks @reta this is really interesting and such a quick progress.

Thanks @kumargu

On a side note, it would be useful to add a small intro snippet how the agent would work overall.

Absolutely, I have updated the description (but will push it a bit once we get JDK-21 baseline with #16366, it would simplify a lot the APIs usage)

@reta reta force-pushed the issue-16633 branch 2 times, most recently from 9858717 to ea045b0 Compare December 16, 2024 18:58
"Can-Retransform-Classes": "true",
"Agent-Class": "org.opensearch.javaagent.Agent",
"Premain-Class": "org.opensearch.javaagent.Agent",
"Boot-Class-Path": 'byte-buddy-1.15.10.jar opensearch-agent-bootstrap-3.0.0-SNAPSHOT.jar'
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opensearch-agent-bootstrap is shared between the OpenSearch service and the agent (so the Policy instance could be propagated)

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for ea045b0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jan 16, 2025
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 58a227c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Jan 17, 2025
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 5e20fde: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 4688fd1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 930e6ef: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kumargu
Copy link
Copy Markdown
Contributor

kumargu commented Jan 28, 2025

@reta is it feasible for the agent to coexist with SM enabled in 3.0, meaning both SM and Agent will enforce socket restrictions?

@reta
Copy link
Copy Markdown
Contributor Author

reta commented Jan 28, 2025

@reta is it feasible for the agent to coexist with SM enabled in 3.0, meaning both SM and Agent will enforce socket restrictions?

@kumargu I think it is feasible in theory but should not be necessary in practice, could you share your thoughts why we may need that?

@kumargu
Copy link
Copy Markdown
Contributor

kumargu commented Jan 28, 2025

@reta is it feasible for the agent to coexist with SM enabled in 3.0, meaning both SM and Agent will enforce socket restrictions?

@kumargu I think it is feasible in theory but should not be necessary in practice, could you share your thoughts why we may need that?

I was thinking we could bring in replacements of JSM in 3.0 while JSM remains enabled in 3.0 (because we'd be still on JDK-21 in 3.0). Having the alternatives coexist for sometime will give us confidence and enough community feedback before we decide to remove it in some 3.x or 4.0.

(note JDK-24 LTS will be available in Sep 2025)

@reta
Copy link
Copy Markdown
Contributor Author

reta commented Jan 28, 2025

Having the alternatives coexist for sometime will give us confidence and enough community feedback before we decide to remove it in some 3.x or 4.0.

I think we would only target a most critical APIs by Java Agent (we just cannot much it to SM), however we should be able to run Java Agent on JDK-21 at least.

@kumargu
Copy link
Copy Markdown
Contributor

kumargu commented Jan 28, 2025

Having the alternatives coexist for sometime will give us confidence and enough community feedback before we decide to remove it in some 3.x or 4.0.

I think we would only target a most critical APIs by Java Agent (we just cannot much it to SM), however we should be able to run Java Agent on JDK-21 at least.

100% agree. Maybe just the Socket interceptor for now since we see the problems with defining the port ranges in the PR #17107

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2025

❌ Gradle check result for c7e3022: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kumargu
Copy link
Copy Markdown
Contributor

kumargu commented Apr 7, 2025

fyi @reta the policy parser PR #17753 is merged.

@reta
Copy link
Copy Markdown
Contributor Author

reta commented Apr 7, 2025

fyi @reta the policy parser PR #17753 is merged.

Sorry @kumargu , was late on that one, we have an issue with it https://github.com/opensearch-project/OpenSearch/pull/17753/files#r2031916820

@kumargu
Copy link
Copy Markdown
Contributor

kumargu commented Apr 7, 2025

fyi @reta the policy parser PR #17753 is merged.

Sorry @kumargu , was late on that one, we have an issue with it https://github.com/opensearch-project/OpenSearch/pull/17753/files#r2031916820

ack. I'll raise a fix in a new PR tomorrow.

@cwperks
Copy link
Copy Markdown
Member

cwperks commented Apr 7, 2025

@kumargu @reta I raised a PR to replace the dependency on :test:framework and to use junit directly: #17821

@kumargu
Copy link
Copy Markdown
Contributor

kumargu commented Apr 7, 2025

Oh thanks. I will cancel mine #17820.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2025

❌ Gradle check result for 1ecaa7a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2025

❌ Gradle check result for c836a03: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2025

❌ Gradle check result for 689934e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kumargu
Copy link
Copy Markdown
Contributor

kumargu commented Apr 8, 2025

❌ Gradle check result for 689934e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Hopefully these are not failing due to the new parser.

Alternately should we try to simplify the FileInterceptor to not use newByteChannel. we can add back the newByteChannel iteratively.

@reta
Copy link
Copy Markdown
Contributor Author

reta commented Apr 8, 2025

Hopefully these are not failing due to the new parser.

Noo, see please #17852

Alternately should we try to simplify the FileInterceptor to not use newByteChannel. we can add back the newByteChannel iteratively.

Is there an issue with newByteChannel?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2025

❕ Gradle check result for 8739577: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

if (task != null) {
if (BuildParams.runtimeJavaVersion > JavaVersion.VERSION_17) {
if (BuildParams.runtimeJavaVersion > JavaVersion.VERSION_17 && BuildParams.runtimeJavaVersion <= JavaVersion.VERSION_23) {
task.jvmArgs += ["-Djava.security.manager=allow"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is -Djava.security.manager=allow needed anymore now that the System.setSecurityManager calls have been removed?

Copy link
Copy Markdown
Contributor Author

@reta reta Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not anymore :-) I keep this pull request as POC to proof check any other work

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrross: @cwperks started taking a look at clean-ups of the -Djava.security.manager across all code bases in core. We think its not a lot of changes and we can do the clean-up (in a day or two) once the final PR from this POC is merged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrross: @cwperks started taking a look at clean-ups of the -Djava.security.manager across all code bases in core. We think its not a lot of changes and we can do the clean-up (in a day or two) once the final PR from this POC is merged.

Thanks @kumargu, I should have the clean pull request to core tomorrow, doing some final tests with JDK-24

…ing, must be low overhead)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: Andriy Redko <drreta@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2025

❌ Gradle check result for ffaf1cd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2025

❌ Gradle check result for ffaf1cd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Copy Markdown
Contributor Author

reta commented Apr 9, 2025

Closing POC in favor of #17861

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request skip-changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[POC] [Security Manager Replacement] Native Java Agent (dynamic code rewriting, must be low overhead)

8 participants