-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Is your feature request related to a problem? Please describe
Currently we don’t have support for any directory implementation which can interact with both local and remote repositories. We are proposing creating a new directory implementation where data is backed in a remote store and not all data needs to be stored locally. This directory will behave as a local directory when complete files are present in disk, but can fall back to the on-demand fetch(can be extended to block level or non block level fetch) from the remote store when data is is not present locally.
Describe the solution you'd like
-
How will the user be able to create a Composite Directory for an index ?
We will add a new type to the index.store.type setting -
compositefsto indicate that this index will use a composite directory. -
What will the Composite Directory look like ?
Here’s what the Class Diagram for Composite Directory will look like:
Our Composite Directory will have a FSDirectory instance(localDirectory), a FileCache instance and a RemoteStoreFileTracker implementation. Most of the file tracking abstractions such as adding files to tracker, checking whether they are present in local or remote etc are handled in the implementation of RemoteStoreFileTracker object - CompositeDirectoryRemoteStoreFileTracker. Abstractions such as fetching files from remote which are not available locally will be handled in the fetchBlob function where we will simply fetch the required files(in block or non-block format). This fetchBlob function will be called in the implementation of fetchBlock function of OnDemandCompositeBlockIndexInput (all abstractions related to block level fetch are handled in this class only)
More details on when the states of a file are changed, how reads and writes are handled given below.
-
When will the states of a file change ?
Any file in Composite directory goes through the following changes:
- Whenever a file is created (createOutput method of directory), we will add it to our FileTracker in DISK state indicating that the file is currently present locally only.
- As soon as the file is uploaded to remote store, we add the file to our FileCache and change the state to CACHE indicating that file is present in cache
- If the file is evicted from FileCache we change the state of the file to REMOTE_ONLY indicating the file is not present locally and needs to be fetched from the remote store for use.
-
How will reads be handled in Composite Directory ?
Whenever a file read is requested(openInput) we will first check the state of the file from our FileTracker. If the file state is:
- DISK → we read it from the local directory
- CACHE → we read it from the fileCache
- REMOTE_ONLY → we fetch the file from remote store(block or non-block), store it in the fileCache and change the state of the file to CACHE so that it can be read from the fileCache
-
How will writes be handled in Composite Directory ?
Whenever a file write is requested(createOutput) we will fallback to the localDirectory to write the file. Since our IndexShard object already has a remote store object containing a remote directory, writes to the remote directory are handled via that only. Our CompositeDirectory will have a function - afterSyncToRemote (called in RemoteStoreRefreshListener after the segments are uploaded) which will take care of writing the files to cache once the file is uploaded to remote store.
Looking forward for review comments and discussions on this.
Related component
Storage:Remote
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status
Status
