Initial push of crypto directory#3
Conversation
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
|
thanks @asonje . will start taking a look. |
…ion. Signed-off-by: Shubham <shubhmkr@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
6e76c92 to
4b98c6c
Compare
Signed-off-by: Gulshan <kumargu@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
7833802 to
a31c9b5
Compare
|
My observations were similar while testing via a opensearch-bechmark. so we are on same page. thanks! |
|
fyi: there is memory leak currently in the panama_v2 branch. I am trying to fix it up. Its is probably because I'm somewhere not closing the arena properly. Hopefully I'll be able to fix it today. |
|
When the plugin needs to load a OpenSSL library (.so/.dylib) is it the plan for on-premise OpenSearch to: Advantages/Disadvantages: Maybe we align its behavoiur with other native plugins like knn, ml etc Wdyt? |
And a typo i think: EgarDecryptedCryptoMMapDirectory -> EagerDecryptedCryptoMMapDirectory |
Signed-off-by: Gulshan <kumargu@amazon.com> So far the best performing batch decryption Signed-off-by: Gulshan <kumargu@amazon.com> Use on-fly decryption for small file extensions Signed-off-by: Gulshan <kumargu@amazon.com> Small and Large file routing Signed-off-by: Gulshan <kumargu@amazon.com> Lazy decryption Signed-off-by: Gulshan <kumargu@amazon.com> Seems to work well Signed-off-by: Gulshan <kumargu@amazon.com> works in 814 seconds Signed-off-by: EC2 Default User <ec2-user@ip-172-31-0-33.ec2.internal> tim files to lazy SUCCESS (took 965 seconds) Signed-off-by: EC2 Default User <ec2-user@ip-172-31-0-33.ec2.internal> Per file based routing Signed-off-by: EC2 Default User <ec2-user@ip-172-31-0-33.ec2.internal> Best performer so far median: 315451 finishes in 910 Signed-off-by: EC2 Default User <ec2-user@ip-172-31-0-33.ec2.internal> Excellent performer Signed-off-by: EC2 Default User <ec2-user@ip-172-31-0-33.ec2.internal> Finishes a bit faster than batch size of 2L-4L Signed-off-by: EC2 Default User <ec2-user@ip-172-31-0-33.ec2.internal> QOL Signed-off-by: Gulshan <kumargu@amazon.com> Route CFS to lazy Signed-off-by: EC2 Default User <ec2-user@ip-172-31-0-33.ec2.internal> More Qol Signed-off-by: Gulshan <kumargu@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
Signed-off-by: Gulshan <kumargu@amazon.com>
|
(benchmarking results on https_logs) Baseline i.e no crytpo------------------------------------------------------
Using Index level Enc in this repo
LUKS based (transparent) encryption
|
|
@salyh looks like opensearch-project/OpenSearch#18085 is ready. I will rebase by Opensearch core against it and try out replacing OpenSSL parts with SunJCE. |
The issue with the Java standard Cipher architecture is that it can get hard to do a real in-place decryption. Had some problems with passing native/direct ByteBuffers into the doFinal() or update() methods. Maybe you have more luck :-) |
| } | ||
|
|
||
| @SuppressWarnings("unused") | ||
| private static void decryptAndProtectPageByPage( |
There was a problem hiding this comment.
@kumargu decryptAndProtectPageByPage() this is yet unused
There was a problem hiding this comment.
Is decryptAndProtect() faster?
There was a problem hiding this comment.
yes decryptAndProtect is slightly faster because it can decrypt contiguous range of pages in one-shot. I still need to figure out a proper error handling for it.
But decryptAndProtectPageByPage is much safer as unsetting the bits is much easier on an exception.
| MemorySegment dst = arena.allocate(length); | ||
| MemorySegment outLen = arena.allocate(ValueLayout.JAVA_INT); | ||
|
|
||
| rc = (int) EVP_EncryptUpdate.invoke(ctx, dst, outLen, src, (int) length); |
There was a problem hiding this comment.
@kumargu (int) length may lead to overflow. There are a lot of potentially unsafe casts to int in this file and in others. I suggest whenever a cast is inevitable to use a method which checks for boundaries
There was a problem hiding this comment.
It is also adviseable to use invokeExact() for performance, because we know the exact signature and no adaption is required.
There was a problem hiding this comment.
I think the casting to int is fine here, because we know that the decoding buffer is always smaller, it is just nowere enforced. IMHO the length parameter of the decryptInto method should be int from the beginning - and if not it should be checked around line 236 (0<=x<=Integer.MAX_VALUE).
Of course for such accesses the check should not be done on every access in tight loops. Therefore in the original MMapDirectory we have many try-catch blocks catching IllegalArgument, ArrayIndexOutOfBounds and NullPointerException because we do not check on every access and for the rare case that someone changes to another segment while reading we do the logic in the catch block (also to check if Input was already closed).
There was a problem hiding this comment.
IMHO the length parameter of the decryptInto method should be int from the beginning - and if not it should be checked around line 236 (0<=x<=Integer.MAX_VALUE)
yeah agreed. this is a strong enforcement.
@kumargu Watch out for this netty/netty#15324 (comment) openjdk/jdk#25324 when dealing with ByteBuffers derived from MemorySegments in doFinal()/update() in Java <25 |
we are already doing it here. Am i missing something? |
Depends on how the MemorySegment is created/scoped. The issue arise when its a shared one. Global and confined seems working. |
|
Ok. I will check with shared arena again, currently we are using confined which works. Shared arena hasn't be super performant (i need to figure out why), but AFAIR, i had worked for me. |
and i just noticed you copy the bytes first to byte[] instead of using the bytebuffer directly in update(bytebuffer, bytebuffer) |
|
This PR will be still open for comments but I think it's time to merge this so that others can start contributing on top of it and address comments.Next week a member from my team will add CI flows and figure out how we can run Opensearch core tests with enc directory. I will create issues from comments which are already not addressed in this PR and we will work upon them in parralel. But it's time to get this merged because of its growing size of changes. I have few more ideas on MMAP directory, so I am gonna restrain from unit tests for an another week and see how it shapes eventually. |
| * - READONCE is just DEFAULT with SEQUENTIAL | ||
| */ | ||
|
|
||
| public static int getMAdviseFlags(IOContext context, String fileName) { |
There was a problem hiding this comment.
Are we extending Lucene's IOContext here?
There was a problem hiding this comment.
yes. but we will also talk to Lucene on the best way to consume Lucene IO context, given MADVISE_RANDOM had some problems recenlty.
There was a problem hiding this comment.
I think the madvise has no real value here, because you consume using a copy-on-write private segment. I'd remove that completely.
If you move to your own, hand written buffer cache as suggested in other issue, then you can implement readahead and madvise on your own.
Description
This pull request adds index level encryption features to OpenSearch based on the issue #3469. Each OpenSearch index is individually encrypted based on user provided encryption keys. A new cryptofs store type index.store.type is introduced which instantiates a CryptoDirectory that encrypts and decrypts files as they are written and read respectively
Related Issues
Resolves opensearch-project/OpenSearch#3469
Related to opensearch-project/OpenSearch#12902
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.