-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Feature Request
Scenario: We have a large number of records stored in an AWS S3 bucket, and we would like to get them into Cassandra.
Problem: dsbulk does not support reading from s3:// URLs, so we tried using aws s3 cp <url> | dsbulk load <...> , but this approach was cumbersome and only allowed loading a single file at a time. (Due to the properties of the system writing these files, there is one record per file; combining them is not feasible.) Since we intend to load a very large number of files, we need a more efficient solution.
Proposed Solution: Upgrade dsbulk to be able to read from s3:// URLs, allowing us to dump a large number of filenames into the urlfile, thereby restoring the "bulk" to dsbulk.
Out-of-Scope: Since I do not have a write-to-S3 scenario, only reading from S3 need be considered for this feature.