docs: guide for multiple bindings on file source connectors#2750
docs: guide for multiple bindings on file source connectors#2750
Conversation
New guide covering how to capture from multiple paths/folders within a single file source capture using flowctl. Tested end-to-end with SFTP and Google Drive connectors. Also fixes Google Drive connector docs: replaces incorrect file_id/path binding properties with stream, adds folderUrl endpoint property, and documents the /u/N/ URL format caveat. Cross-reference tip boxes added to S3, GCS, SFTP, and Google Drive connector pages.
|
aeluce
left a comment
There was a problem hiding this comment.
Useful addition, thanks!
I added some comments on minor considerations, but it'd be fine to merge without additional changes.
|
|
||
| # Capture Multiple Paths with File Source Connectors | ||
|
|
||
| File source connectors like Amazon S3, Google Cloud Storage, SFTP, Google Drive, Azure Blob Storage, Dropbox, and HTTP File all support capturing from multiple paths within a single capture task. However, the Estuary web app only creates a single binding during initial setup. To add additional paths, you can use flowctl to manually configure extra bindings. |
There was a problem hiding this comment.
Since you mention Dropbox here, it looks like the Dropbox docs currently say: This connector is designed for files located in a specific Dropbox folder.
Would it be helpful to update this language to make it clear that multiple folders can be configured?
There was a problem hiding this comment.
Good catch — updated the Dropbox docs. Changed "This connector is designed for files located in a specific Dropbox folder" and "captures the data within the specified Dropbox folder into a single Estuary collection" to reflect that multiple folders are supported. Also added a tip box cross-referencing the new guide, matching the other connector pages.
While I was at it, added the same cross-reference tip boxes to Azure Blob Storage and HTTP File connector docs too — those were the remaining file source connectors that didn't have them.
|
|
||
| ### 3. Add a new binding | ||
|
|
||
| In the `bindings` array, add a new entry with a different `stream` value but the same `target` collection: |
There was a problem hiding this comment.
Would the Advanced Specification Editor work for this use case as well? We could briefly mention it at the beginning of Option A.
There was a problem hiding this comment.
Good call — yes it does! I tested it with Google Drive: edited a capture via the Advanced Spec Editor, added a second binding with a different folder ID and the same target collection, and it saved and published successfully.
Added a paragraph at the top of Option A mentioning the Advanced Spec Editor as a lighter-weight alternative to flowctl. It also notes that this only works for Option A (same collection) — Option B requires creating new collections, which must be done via flowctl.
| | **`/folderUrl`** | Folder URL | URL of the Google Drive folder to capture from. Must be `https://drive.google.com/drive/folders/FOLDER_ID`. If your URL contains `/u/0/` or `/u/1/` (from Google's account switcher), remove that segment. | string | Required | | ||
| | **`/credentials`** | Credentials | Google OAuth2 credentials or service account JSON for authentication. | object | Required | | ||
| | `/matchKeys` | Match Keys | Filter applied to file paths under the folder. If provided, only files whose paths match this regex will be read. | string | | | ||
| | `/parser` | Parser Configuration | Configures how files are parsed (optional). | object | | |
There was a problem hiding this comment.
Technically, we could include properties for the credentials and parser objects here as well (/credentials/auth_type, /parser/compression, etc).
But the parser config especially is complex enough and shared by enough different connectors that we may want to give it its own page instead sometime (no action needed for this PR).
There was a problem hiding this comment.
Agreed — a dedicated parser config page would be good as a follow-up. The parser config is basically identical across all file source connectors so it makes sense to document it once rather than repeating it on every connector page. No action for this PR.
Summary
guides/flowctl/multiple-file-source-bindings.md— documents how to capture from multiple paths/folders within a single file source connector using flowctlfile_id/pathbinding properties withstream, added missingfolderUrlendpoint property, documented the/u/N/URL format caveatContext
File source connectors all support multiple bindings at runtime, but the UI/discovery only creates a single binding. This leads to confusion — users think multiple paths require multiple captures. This guide covers the flowctl workflow for adding extra bindings, with two options (same collection vs. separate collections).
Tested
Verified end-to-end with real captures: