Implement Bulk Deletes for GCS Repository#41368
Implement Bulk Deletes for GCS Repository#41368original-brownbear merged 12 commits intoelastic:masterfrom original-brownbear:gcs-batch-delete
Conversation
original-brownbear
commented
Apr 19, 2019
- Just like Add Bulk Delete Api to BlobStore #40322 for AWS
- We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it
- Made the bulk delete API also compliant with our interface that only suppresses errors about non-existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex)
- Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations
* Just like #40322 for AWS * We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex) * Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations
|
Pinging @elastic/es-distributed |
...sitory-gcs/src/main/java/org/elasticsearch/repositories/gcs/GoogleCloudStorageBlobStore.java
Outdated
Show resolved
Hide resolved
...sitory-gcs/src/main/java/org/elasticsearch/repositories/gcs/GoogleCloudStorageBlobStore.java
Outdated
Show resolved
Hide resolved
ywelsch
left a comment
There was a problem hiding this comment.
AFAICS, Google's SDK already implements the max batch size (see HttpStorageRpc.DefaultRpcBatch) so no need to manually implement it.
|
@ywelsch you're good with this one too? :) |
ywelsch
left a comment
There was a problem hiding this comment.
AFAICS, what we would like to implement is the deleteBlobsIgnoringIfNotExists method. The behavior of List<Boolean> delete(Iterable<BlobId> blobIds) method in the Storage class of Google's SDK equate's failure and missing file:
@return an immutable list of booleans. If a blob has been deleted the corresponding item in the
* list is {@code true}. If a blob was not found, deletion failed or access to the resource
* was denied the corresponding item is {@code false}.
We want the distinction between the two, and not throw an exception if the file is just missing. This can be achieved by using the lower-level batch() method, building the batch yourself and then plugging in a different error handling mapping.
We should also have tests that check that this behavior is correctly implemented by any deleteBlobsIgnoringIfNotExists method.
On a related note, we should delete the "delete" method from the BlobStore interface, and replace it's single use by a blobcontainer.
...sitory-gcs/src/main/java/org/elasticsearch/repositories/gcs/GoogleCloudStorageBlobStore.java
Outdated
Show resolved
Hide resolved
...sitory-gcs/src/main/java/org/elasticsearch/repositories/gcs/GoogleCloudStorageBlobStore.java
Outdated
Show resolved
Hide resolved
|
Jenkins run elasticsearch-ci/1 |
1 similar comment
|
Jenkins run elasticsearch-ci/1 |
ywelsch
left a comment
There was a problem hiding this comment.
We should also have tests that check that this behavior is correctly implemented by any deleteBlobsIgnoringIfNotExists method.
Can you also address that one? In particular, I'm thinking about ESBlobStoreContainerTestCase
| protected String buildKey(String blobName) { | ||
| @Override | ||
| public void deleteBlobsIgnoringIfNotExists(List<String> blobNames) throws IOException { | ||
| blobStore.deleteBlobs(blobNames.stream().map(this::buildKey).collect(Collectors.toList())); |
There was a problem hiding this comment.
the method on blobstore should now also be renamed to deleteBlobsIgnoringIfNotExists
|
|
||
| @Override | ||
| public void error(StorageException exception) { | ||
| if (exception.getCode() == 404) { |
There was a problem hiding this comment.
let's use the constant here: HttpURLConnection.HTTP_NOT_FOUND
| if (exception.getCode() == 404) { | ||
| results.add(Boolean.TRUE); | ||
| } else { | ||
| results.add(Boolean.FALSE); |
There was a problem hiding this comment.
I would prefer to have this consistent with the S3 implementation, which currently collects the exceptions and add them as suppressed.
| } | ||
| if (failed) { | ||
| throw new IOException("Failed to delete all [" + blobIdsToDelete.size() + "] blobs"); | ||
| throw new IOException("Failed to delete the following blobs " + blobIdsToDelete); |
There was a problem hiding this comment.
perhaps just "Failed to delete blobs " + blobIdsToDelete
| callback.success(true); | ||
| return null; | ||
| }).when(result).notify(any(BatchResult.Callback.class)); | ||
| when(batch.delete(any(BlobId.class), anyVararg())).thenAnswer(invocation -> { |
There was a problem hiding this comment.
can we somehow check that no other method is called on this mock?
There was a problem hiding this comment.
Jup done :)
Oh sorry failed to answer here. We already have that test in place IMO: https://github.com/elastic/elasticsearch/blob/master/test/framework/src/main/java/org/elasticsearch/repositories/ESBlobStoreContainerTestCase.java#L140 |
| return ioe.get(); | ||
| }); | ||
| if (e != null) { | ||
| throw e; |
There was a problem hiding this comment.
this now throws the StorageException, and not an IOException like S3's BlobContainer. Let's use one pattern (the one we had for S3), not two different ones.
|
Jenkins run elasticsearch-ci/packaging-sample |
| return ioe.get(); | ||
| }); | ||
| if (e != null) { | ||
| throw new IOException(e); |
There was a problem hiding this comment.
Also add a message just as for S3BlobContainer? i.e. "Exception when deleting blobs [" + blobNames + "]"
There was a problem hiding this comment.
I consciously didn't do that here, since it's a little different from the AWS case in that we will always submit the full batch of deletes since the sub-batches are internal to the client (unlike in S3 where we do the splitting up of the batch).
So I expect us to always get an exception for every failed blob, making listing them again a little redundant (plus we're catching these exceptions upstream anyway and logging the blobs that we tried to delete)?
Maybe rather remove the listing from S3 as well? (I didn't realize it at the time, but it seems completely redundant when we always log the list of blobs upstream anyway)
There was a problem hiding this comment.
It may be nicer to collect the files that failed deletion here than at the caller site. It allows filtering the list down to the actual file deletions that failed (i.e. instead of just blobNames, we can filter it down to those that actually experienced a failure). A similar thing can be done for S3.
There was a problem hiding this comment.
Makes sense. Maybe keep it all in the exception though and simply make sure that the blob names are all in the exception?
I don't like manually collecting the list of failed deletes without exceptions tbh., it doesn't really give us any information. We want to know why a delete failed. The fact that it failed we can see by listing the stale blobs later on already :P
Then we could simply remove the blobs list from all the upstream logging and be done?
=> how about keeping this like they are here and adjusting the S3 implementation and upstream logging in a subsequent PR?
There was a problem hiding this comment.
ok to adapt call sites in a follow-up PR, but let's add the message with all the blob names at least in this PR here.
|
thanks @ywelsch, @andrershov ! |
* Implement Bulk Deletes for GCS Repository (#41368) * Just like #40322 for AWS * We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex) * Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations back port of #41368
* Implement Bulk Deletes for GCS Repository * Just like elastic#40322 for AWS * We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex) * Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations
* Cleanup Bulk Delete Exception Logging * Follow up to #41368 * Collect all failed blob deletes and add them to the exception message * Remove logging of blob name list from caller exception logging
* Implement Bulk Deletes for GCS Repository * Just like elastic#40322 for AWS * We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex) * Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations
* Cleanup Bulk Delete Exception Logging * Follow up to elastic#41368 * Collect all failed blob deletes and add them to the exception message * Remove logging of blob name list from caller exception logging