[HUDI-845] Added locking capability to allow multiple writers#2374
[HUDI-845] Added locking capability to allow multiple writers#2374n3nash merged 2 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2374 +/- ##
============================================
- Coverage 51.87% 9.52% -42.35%
+ Complexity 3556 48 -3508
============================================
Files 465 53 -412
Lines 22165 1963 -20202
Branches 2357 235 -2122
============================================
- Hits 11498 187 -11311
+ Misses 9667 1763 -7904
+ Partials 1000 13 -987
Flags with carried forward coverage won't be shown. Click here to find out more. |
f2fc458 to
592e4bd
Compare
vinothchandar
left a comment
There was a problem hiding this comment.
some high level comments. High level, looks promising to me.
There was a problem hiding this comment.
whats the reason to have this in hudi-common? locking is only used by writing correct.
There was a problem hiding this comment.
So we need to pass the same configs to HiveMetastoreBasedLockProvider which is in hudi-hive-sync as well as ZookeeperBasedLockProvider which is in hudi-client. This object helps to keep the config reading from the same class.
c2f054f to
98c3c7d
Compare
|
@vinothchandar Addressed most of the code comments replied to some which needs clarification/resolution. 2 high level items are missing 1. Adding a new test class for multi-writer 2. Testing with test-suite / real job. It's ready for final review in the meantime. |
21792c6 to
7f975c1
Compare
ccc5f9a to
15157b3
Compare
|
@vinothchandar This PR is ready, concurrency control added for writers and table services, documentation to follow after this PR is merged. |
7ababea to
8b8d594
Compare
|
@n3nash can you please rebase and repush |
vinothchandar
left a comment
There was a problem hiding this comment.
Can we move the scheduling of compaction and clustering into the writeClient itself, protected in the critical section.
I can review the lock implementations here more line-by-line. but this is the main gap that I saw
There was a problem hiding this comment.
should archival be here too?
There was a problem hiding this comment.
It should be, but for each of these table services, we now follow the steps of schedule -> inflight -> complete. Archival doesn't do that right now and we never run archival async (unlike clean), so I haven't added it here. I have refactored the clean actions to do this. Filed a ticket for async archival -> https://issues.apache.org/jira/browse/HUDI-1576
There was a problem hiding this comment.
we should check somewhere that user cannot turn on both async and inline? , if we are adding an explicit config.
There was a problem hiding this comment.
Right now, this config is harmless, it just assumes the role of autoClean.
There was a problem hiding this comment.
this does not read easily. rename inlineCleaningEnabled or shouldCleanInline() or something like tht? (same wherever applicable)
There was a problem hiding this comment.
I kept the same naming convention as before for isInlineCompaction etc. I refactored all of them to inline<..>Enabled
There was a problem hiding this comment.
rename to initIfNeeded consistently
There was a problem hiding this comment.
refactored to bootstrapMetadata
There was a problem hiding this comment.
we can also rename this method if needed
There was a problem hiding this comment.
refactored to bootstrapMetadata
e93d599 to
340efd2
Compare
|
@vinothchandar Addressed your comments and replied to other ones, PTAL. I've also refactored the code to make the flow simpler. |
47403db to
99b341e
Compare
|
@vinothchandar Addressed your feedback comments and build is ready. Some more things to note:
Some warnings ( I will add these to the docs when I send the PR for documentation)
|
But we should be able to allow backfills using spark sql, which don't really mess with checkpoints? We cannot ignore this scenario. Can you look into how we can provide best practices |
1886bbd to
01577fd
Compare
|
@vinothchandar It's possible to allow backfills using spark-sql but there are some corner cases. Consider the following:
I've made the following changes:
Yes, I am going to add documents on best practices / things to watch out in the other PR I opened for documentation. I will do that after resolving any further comments and landing this PR in the next couple of days. NOTE: The test may be failing because of some thread.sleep related code that I'm trying to remove. Will update tomorrow. |
4e029d3 to
9278889
Compare
1. Added LockProvider API for pluggable lock methodologies 2. Added Resolution Strategy API to allow for pluggable conflict resolution 3. Added TableService client API to schedule table services 4. Added Transaction Manager for wrapping actions within transactions
8c092be to
bbb02ab
Compare
|
@vinothchandar Build succeeds locally and should pass on jenkins (will check tomorrow morning), ready for review. |
vinothchandar
left a comment
There was a problem hiding this comment.
Just went over the feedback from my last review.
- have you tested the spark bundle end to end? I think so, based on #2660 . Just making sure.
- Can we also add the curator deps to flink and utilities bundles?
Pending these, once CI is happy. We can land and do the follow ups.
…etadata overriding
|
@vinothchandar I've added curator to utilities bundle. In this PR, we don't support locking for flink and java clients so I have not added to the flink bundle. If you have another reason to add curator to flink bundles, let me know. |
|
cc @yanghua @danny0405 @leesf to take note for flink/java support follow up. |
|
Here is the issue for the follow up -> https://issues.apache.org/jira/browse/HUDI-1698. I'll work with you guys offline @danny0405 @yanghua @leesf |
Thanks for the notion, would take up the following work. |
This diff -> #2359 is a pre-requisite for landing this PR.
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.