Skip to content

feat(source-stripe): expose supports_realtime_sync in catalog metadata#345

Merged
tonyxiao merged 8 commits intomainfrom
kd/realtime-sync-catalog-flag
Apr 28, 2026
Merged

feat(source-stripe): expose supports_realtime_sync in catalog metadata#345
tonyxiao merged 8 commits intomainfrom
kd/realtime-sync-catalog-flag

Conversation

@kdhillon-stripe
Copy link
Copy Markdown
Collaborator

@kdhillon-stripe kdhillon-stripe commented Apr 27, 2026

Summary

The Stripe source discover() had two independent discovery paths that each walked the OpenAPI spec separately, applied different inclusion logic, produced different output shapes, and were then joined by a fragile name-based intersection:

buildResourceRegistry(spec)            SpecParser.parse(spec)
  discoverListEndpoints()                discoverAllowedTables()
  - finds ~90 top-level listable           discoverListableResourceIds(includeNested:true)
  - filters by EXCLUDED_TABLES               - finds ~112 listable (incl. nested)
  - builds listFn/retrieveFn              discoverWebhookUpdatableResourceIds()
  - outputs: Record<name, ResourceConfig>    - finds ~63 with webhooks
                                             - intersects: ~59 resource IDs
                                             - resolves table names via aliases
                                             - outputs: ParsedResourceTable[]
         \                                /
          \                              /
           catalogFromOpenApi(tables, registry)
             iterates registry entries
             looks up table by name from a separate Map
             result: catalog streams

Neither side owned both concerns: buildResourceRegistry was the source of truth for what can be synced (it has the runtime list/retrieve functions), while SpecParser was the source of truth for what should be synced (webhook filter + schema). The two were glued together by a name-based Map.get() that could silently produce streams without schemas ("ghost tables") or quietly drop tables when the two sets disagreed.

Solution

Establish a single linear pipeline where each step feeds the next, rather than two parallel paths joined at the end.

Before

flowchart TB
    spec[OpenAPI spec]

    spec --> left["buildResourceRegistry(spec)"]
    spec --> right["SpecParser.parse(spec)"]

    left --> reg["registry — runtime fns per table"]
    right --> tables["parsed.tables — schemas per table"]

    reg --> cfo["catalogFromOpenApi(tables, registry)"]
    tables --> cfo
    cfo -->|"join by name"| catalog["catalog.streams"]
Loading

Two independent walks of the spec, joined by name at the end.

After

flowchart TB
    spec[OpenAPI spec]

    spec --> ps["parser.parseSyncable(spec, {excluded})"]
    ps --> parsed["parsed.tables — syncable tables with schemas"]

    parsed --> brr["buildResourceRegistry(spec, ..., parsedTables)"]
    spec --> brr
    brr --> reg["registry — runtime fns + schema per entry"]

    reg --> cfo["catalogFromOpenApi(registry)"]
    cfo --> catalog["catalog.streams"]
Loading

Single linear flow. No parallel discovery, no name-based join.

What changed

  1. SpecParser.parseSyncable() — new method that fuses discoverSyncableTables + parse into one call. The canonical "what should be synced" decision happens once, here.

  2. parsedTable on ResourceConfig — each registry entry now carries its own parsed schema, attached at build time. The schema travels with the runtime functions instead of being looked up by name later.

  3. catalogFromOpenApi(registry) — takes one argument instead of two. No more Map.get() join. Each entry already has everything needed to emit a stream. Throws if parsedTable is missing (ghost-table guard moved here).

Copilot AI review requested due to automatic review settings April 27, 2026 19:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds per-stream catalog metadata to indicate whether a Stripe table is updated via real-time webhooks, enabling destinations to distinguish webhook-backed streams from backfill-only streams.

Changes:

  • Introduces REALTIME_SYNC_TABLES as a canonical set of webhook-updated table names.
  • Sets stream.metadata.supports_realtime_sync in catalogFromOpenApi() based on REALTIME_SYNC_TABLES membership.
  • Adds unit tests covering one real-time stream (subscriptions) and one list-only stream (reporting_report_types).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
packages/source-stripe/src/types.ts Adds REALTIME_SYNC_TABLES to enumerate tables receiving webhook-driven updates.
packages/source-stripe/src/catalog.ts Populates metadata.supports_realtime_sync for each discover stream.
packages/source-stripe/src/catalog.test.ts Adds tests asserting the flag is true for a webhook-backed stream and false for a list-only stream.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kdhillon-stripe kdhillon-stripe force-pushed the kd/realtime-sync-catalog-flag branch from 026644c to c70a59b Compare April 27, 2026 20:32
kdhillon-stripe and others added 2 commits April 28, 2026 02:11
Add a `supports_realtime_sync` boolean to each stream's metadata in the
discover catalog. This lets destinations know which tables receive
real-time updates via webhooks vs. tables that are backfill-only.

A new `REALTIME_SYNC_TABLES` set in types.ts is derived from the
existing SUPPORTED_WEBHOOK_EVENTS list. Two new tests verify the flag
is true for webhook-backed tables and false for list-only tables.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
@Yostra Yostra force-pushed the kd/realtime-sync-catalog-flag branch from c70a59b to b20f633 Compare April 28, 2026 01:06
@tonyxiao tonyxiao merged commit d86a666 into main Apr 28, 2026
20 checks passed
@tonyxiao tonyxiao deleted the kd/realtime-sync-catalog-flag branch April 28, 2026 03:14
tonyxiao pushed a commit that referenced this pull request Apr 28, 2026
#345)

* feat(source-stripe): expose supports_realtime_sync in catalog metadata

Add a `supports_realtime_sync` boolean to each stream's metadata in the
discover catalog. This lets destinations know which tables receive
real-time updates via webhooks vs. tables that are backfill-only.

A new `REALTIME_SYNC_TABLES` set in types.ts is derived from the
existing SUPPORTED_WEBHOOK_EVENTS list. Two new tests verify the flag
is true for webhook-backed tables and false for list-only tables.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude

* step one, remove the fan out of discover list...

* unify discover pipeline, fuse schema into registry, single-arg catalogFromOpenApi

* use bundled version

* skip version with no webhook info

* use enabled events to discover webhook supported objects

* v2 webhooks

* catalog snap

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Yostra <straya.mark@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants