Support reading records from AWS S3.

### Feature Request
**Scenario:** We have a large number of records stored in an AWS S3 bucket, and we would like to get them into Cassandra.

**Problem:** `dsbulk` does not support reading from `s3://` URLs, so we tried using `aws s3 cp <url> | dsbulk load <...>` , but this approach was cumbersome and only allowed loading a single file at a time. (Due to the properties of the system writing these files, there is one record per file; combining them is not feasible.) Since we intend to load a very large number of files, we need a more efficient solution.

**Proposed Solution:** Upgrade `dsbulk` to be able to read from `s3://` URLs, allowing us to dump a large number of filenames into the `urlfile`, thereby restoring the "bulk" to `dsbulk`.

**Out-of-Scope:** Since I do not have a write-to-S3 scenario, only reading from S3 need be considered for this feature.



┆Issue is synchronized with this [Jira Task](https://datastax-oss.atlassian.net/browse/BULK-22) by [Unito](https://www.unito.io)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support reading records from AWS S3. #398

Feature Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support reading records from AWS S3. #398

Description

Feature Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions