Skip to content

Support reading records from AWS S3. #398

@DavidTaylorEmberex

Description

@DavidTaylorEmberex

Feature Request

Scenario: We have a large number of records stored in an AWS S3 bucket, and we would like to get them into Cassandra.

Problem: dsbulk does not support reading from s3:// URLs, so we tried using aws s3 cp <url> | dsbulk load <...> , but this approach was cumbersome and only allowed loading a single file at a time. (Due to the properties of the system writing these files, there is one record per file; combining them is not feasible.) Since we intend to load a very large number of files, we need a more efficient solution.

Proposed Solution: Upgrade dsbulk to be able to read from s3:// URLs, allowing us to dump a large number of filenames into the urlfile, thereby restoring the "bulk" to dsbulk.

Out-of-Scope: Since I do not have a write-to-S3 scenario, only reading from S3 need be considered for this feature.

┆Issue is synchronized with this Jira Task by Unito

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions