Skip to content

ai/live: Remote signer implementation for tickets#3822

Merged
j0sh merged 25 commits intomasterfrom
ja/remote-signer-tickets
Jan 31, 2026
Merged

ai/live: Remote signer implementation for tickets#3822
j0sh merged 25 commits intomasterfrom
ja/remote-signer-tickets

Conversation

@j0sh
Copy link
Collaborator

@j0sh j0sh commented Dec 5, 2025

TODO

  • PR writeup
  • Unit tests

This PR completes the remote signing feature, allowing gateways to retrieve PM tickets for Live AI (live-video-to-video) without requiring any on-chain connectivity or possession of an Ethereum signing key. See the design background for additional motivation and design detail around remote signers. Refer to #3791 for instructions on how to enable this feature.

Retrieving tickets is mostly done via implementing the LivePaymentSender interface with a new implementation: remotePaymentSender in live_payment.go. The LivePaymentSender implementations (signer or non-signer) is also initialized earlier in the process, before an orchestrator is requested, and stored in the LiveParams struct. This is so the gateway can send an upfront payment to the orchestrator using remote signers. Processing remote payment signing requests happens in the remote_signer.go file.

When a job first starts, the gateway sends an upfront payment to the orchestrator encoded in the initial request header. To support this, the API for the remotePaymentSender also offers a standalone RequestPayment method to retrieve signed tickets without sending them. The non-remote signer does not have a clean, singular method to retrieve tickets; at some point we may codify this behind a proper interface and clean up this bit, but that can come later to avoid introducing additional concepts to an already involved PR.

Remote Signing Protocol

Refer to the design document for context behind the design of the protocol. Here is some more detail on that:

  • There are 2 bits of state: the remote signer's state, and the orchestrator's state (OrchestratorInfo ticket parameters). The remote signing protocol is stateless, and each call to sign tickets returns an updated state. The gateway is responsible for retaining both bits of state in between calls, and re-sending the state to the remote signer.
  • The remote signer's state is itself signed to prevent tampering. The OrchestratorInfo data is already signed.
  • There is a loose requirement for the gateway to store the payment response since it contains updated OrchestratorInfo data. However, this is not strictly necessary; the existing OrchestratorInfo can be reused until its parameters expire.
  • If expired OrchestratorInfo parameters are sent to the signer, the signer will respond with an internal status code of 480 ("HTTPStatusRefreshSession") indicating the client should retrieve a fresh set of parameters using an GetOrchestratorInfo RPC request. This comes at the cost of an additional set of requests to the O and the signer, but the impact should be negligible given that Live AI payments are asynchronous and there is typically a bit of a buffer before the gateway depletes its balance with the O.
sequenceDiagram
    participant O as Orchestrator
    participant G as Gateway
    participant S as Signer

    %% Initial session setup
    G->>S: getOrchInfoSig()
    S-->>G: gatewaySig
    G->>O: getOrchInfo(gatewaySig)
    O-->>G: ticketParams₀

    %% First signing call (no prior signer state)
    Note over S: state is null → create fresh signer state
    G->>S: signTicket(state=null, ticketParams₀)
    S-->>G: signedTicket₀, signerState₀

    G->>O: pay(signedTicket₀)
    O-->>G: ticketParams₁

    G->>S: signTicket(signerState₀, ticketParams₁)
    S-->>G: signedTicket₁, signerState₁

    %% Subsequent calls (k = 1..N)
    loop For each k = 1..N
        Note over S: NB: ticketParamsₖ reusable between<br>calls as long as it is valid but not<br>signedTicketₖ or signerStateₖ
        G->>S: signTicket(signerStateₖ₋₁, ticketParamsₖ)
        S-->>G: signedTicketₖ, signerStateₖ

        G->>O: pay(signedTicketₖ)
        O-->>G: ticketParamsₖ₊₁
    end
Loading

PM Changes

All the changes here are used only by the remote signer, so the impact on the existing code is minimal.

The Sender interface adds two new methods: a StartSessionWithNonce constructor, and a Nonce accessor. The nonce is a (mostly internal) PM construct that allows for multiple tickets to be generated using the same set of PM parameters. For ordinary signers, the Sender persists for the duration of the session, so the nonce would stay internal and be incremented as necessary. However, since remote signers are stateless, the nonce needs to be extracted and set with each signing call, and that is what we do here.

The Balance struct has a new Reserve() method added to zero out the current balance. This addition makes the Balance more closely mirror the API of the nested AddressBalances() list. (Otherwise I would have chosen a better name than "Reserve" to zero out a balance.)

Note that the Balances object itself hides quite a bit of nested global accouting that we don't strictly need here, and it would be much neater to not have to use these in favor of strictly request-local accounting. However, this would make the rest of the implementation more complex, since the BroadcastSession works on the global Balances and most of the payment helper functions themselves take a BroadcastSession ... so here we go.

There is also a small change in starter.go to initialize more PM and Ethereum scaffolding (watchers etc) when the node starts up in remote signer mode.

@github-actions github-actions bot added go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Dec 5, 2025
@rickstaa
Copy link
Member

rickstaa commented Jan 4, 2026

@j0sh thanks for the draft pull request. I think this is very cleanly implemented, and I don’t have any direct concerns from my side.

}

// Forward payment + segment credentials to orchestrator
url := segmentInfo.sess.OrchestratorInfo.Transcoder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j0sh Would it make sense to put this logic in a shared helper since it’s used by both signers, or is it better to wait given that both signers are still in flux?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared helper since it’s used by both signers

Maybe I am missing something really obvious, but what is this other signer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry the signing a gateway does when nog using remote signer. However I do remember your goal of not interfering with existing code for now and the code is minimal. Just some thing I was wondering.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes I see now.

There is additional structural similarity here: the non-remote livePaymentSender can probably be split into two pieces, similar to how the remotePaymentSender API is split between RequestPayment (to retrieve / generate the tickets) and SendPayment (to actually send the tickets). This would clean up a bit of lingering cruft around job initialization (see the PR description).

When we do this, we can also move to using a shared implementation to send HTTP payments. But yes let's hold that thought until after these changes are merged, so we can verify one thing at a time.

ManifestID string

// Number of pixels to generate a ticket for. Required if `type` is not set.
InPixels int64 `json:"inPixels"`
Copy link
Member

@rickstaa rickstaa Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j0sh should we move away from pixel-based pricing to a more general pricing model that also supports BYOC's per-second compute, or wait until the payment clearinghouse is validated? I see you already added a type field below to extend this to other job types later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, live-video-to-video also uses per-second payments, with a fixed size "base unit" of 720p@30fps. Avoiding any changes to the current live-video-to-video pricing model was an explicit goal here, so I would not change anything about the pricing model right now.

Different job types can be supported later on, either by adding a new field to this struct or setting the types field to a specific value.

return
}

// Generate segment credentials with an empty segment
Copy link
Member

@rickstaa rickstaa Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j0sh should we move this down below to where is used?

}

// Compute required fee
fee := calculateFee(pixels, priceInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j0sh similar to comment above. I think it would be nice to make this function more general and make it a shared helper by chaning pixels to units so we can share logic between BYOC and live-video-to-video.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rickstaa
Copy link
Member

rickstaa commented Jan 5, 2026

@j0sh, I reviewed this pull request and left a few minor comments. Similar to #3791, the overall approach and logic here make sense to me. With that in mind, I think this can be merged once you and @ad-astra-video are comfortable that the implementation leaves room for future extension to BYOC use cases, along with adding tests and completing E2E validation.

If we set aside BYOC batch legacy payments for now, both systems already rely on streaming-based payments. While BYOC streaming and Live Video-to-Video currently use separate payment endpoints (/ai/stream/payment vs /payment), the high-level placement and responsibilities of the payment logic appear well aligned.

Based on a quick review (and please correct me if I’m mistaken, @ad-astra-video, @j0sh), the remaining differences seem to be:

  • Session model: ManifestID/AuthToken with refresh vs job/stream token with balance reconciliation
  • Fee units: pixels/time vs job-based accounting
  • State tracking: explicit gateway nonce/balance tracking vs orchestrator-reported balance

Given this, my current recommendation would be to gradually converge the streaming paths by:

  1. Generalizing Live’s “pixels” concept into a compute-units model, allowing BYOC streaming to reuse the same fee computation logic.
  2. Extracting and sharing the payment helper logic (ticket generation, /payment submission, and session state updates) from live-video-to-video so it can also be reused by BYOC streaming.
  3. Preserving the trustless dual-tracking model used in live-video-to-video, where the gateway/signer tracks nonce and balance while the orchestrator enforces.

Although it’s still early and we haven’t had many discussions yet, I do like the stateless remote signer approach taken in this pull request. Keeping the signer lean and predictable, without persistent state or coordination requirements, feels for now like the right tradeoff. It reduces operational complexity, avoids shared databases, and preserves flexibility for gateways and orchestrators. This also aligns well with future verifiable compute work and potential dispute mechanisms, while keeping the orchestrator as the final enforcer.

I think this recommendation is similar to my recommendation on the BYOC Streaming stating updates live-video-to-video can do to converge the two pipelines more. I don’t think all of this needs to be addressed immediately, we want to move quickly to demonstrate the live-video-to-video POC. However, if possible, it would be good to at least start with (1) so that future extensibility is kept in mind as we move forward.

Copy link
Member

@rickstaa rickstaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to #3791, I’ve approved the technical implementation since I don’t see any major concerns outside the comments listed above. This should allow us to start testing with the INC infrastructure first. You can coordinate with @ad-astra-video on follow-up PRs to support BYOC, add tests, and perform end-to-end verification.

@j0sh j0sh force-pushed the ja/remote-signer-orchinfo branch from 210bcbf to e958057 Compare January 23, 2026 00:37
j0sh added a commit that referenced this pull request Jan 26, 2026
This PR is the first part of the remote signing feature, implementing the GetOrchestratorInfo request. See the design background for additional motivation and design detail around remote signers.

Remote signing support for Live AI (live-video-to-video) consists of two parts:

GetOrchestratorInfo (this PR)
PM ticket retrieval (ai/live: Remote signer implementation for tickets #3822)
This PR adds a new mode to go-livepeer: -remoteSigner.

The remote signer exposes a HTTP POST endpoint at /sign-orchestrator-info. It currently does only does one thing: produces a signature for use in the OrchestratorInfo gRPC call. The response includes the signer's Ethereum address as well as the signature itself.

Like the other types of mode flags (gateway, orchestrator, redeemer, etc) the remote signer cannot be combined with other modes.

The gateway adds a new -remoteSignerUrl flag which specifies the base address of the remote signer to use (host:port). When configured, the gateway pre-fetches the remote signature for OrchestratorInfo at start-up time and caches it, so subsequent calls to OrchestratorInfo do not have to incur additional remote calls.

One might note that this OrchestratorInfo signing scheme doesn't actually accomplish much at all: the signature effectively static, not scoped to any one orchestrator, and never expires. This is a legacy aspect of the OrchestratorInfo flow that we have to accommodate for now; it is a vestigial corner of the codebase, but one that we've judged internally to be harmless.
Base automatically changed from ja/remote-signer-orchinfo to master January 26, 2026 17:34
@j0sh j0sh force-pushed the ja/remote-signer-tickets branch from bb04b97 to 4a5c0c0 Compare January 27, 2026 18:55
@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

❌ Patch coverage is 61.09589% with 142 lines in your changes missing coverage. Please review.
✅ Project coverage is 32.46227%. Comparing base (778c2e1) to head (34c8da4).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
server/remote_signer.go 79.71698% 30 Missing and 13 partials ⚠️
server/live_payment.go 58.06452% 35 Missing and 4 partials ⚠️
server/ai_process.go 0.00000% 30 Missing ⚠️
pm/sender.go 0.00000% 12 Missing ⚠️
server/ai_mediaserver.go 0.00000% 8 Missing ⚠️
pm/stub.go 0.00000% 6 Missing ⚠️
core/accounting.go 0.00000% 2 Missing ⚠️
cmd/livepeer/starter/starter.go 0.00000% 1 Missing ⚠️
server/ai_live_video.go 0.00000% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #3822         +/-   ##
===================================================
+ Coverage   32.19428%   32.46227%   +0.26799%     
===================================================
  Files            170         170                 
  Lines          41321       41673        +352     
===================================================
+ Hits           13303       13528        +225     
- Misses         27016       27124        +108     
- Partials        1002        1021         +19     
Files with missing lines Coverage Δ
cmd/livepeer/starter/starter.go 22.49474% <0.00000%> (ø)
server/ai_live_video.go 0.00000% <0.00000%> (ø)
core/accounting.go 93.75000% <0.00000%> (-1.70455%) ⬇️
pm/stub.go 55.46218% <0.00000%> (-1.43437%) ⬇️
server/ai_mediaserver.go 6.64557% <0.00000%> (-0.04233%) ⬇️
pm/sender.go 86.79245% <0.00000%> (-11.07989%) ⬇️
server/ai_process.go 1.97300% <0.00000%> (-0.04613%) ⬇️
server/live_payment.go 54.70588% <58.06452%> (+2.14178%) ⬆️
server/remote_signer.go 61.45455% <79.71698%> (+61.45455%) ⬆️

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 778c2e1...34c8da4. Read the comment docs.

Files with missing lines Coverage Δ
cmd/livepeer/starter/starter.go 22.49474% <0.00000%> (ø)
server/ai_live_video.go 0.00000% <0.00000%> (ø)
core/accounting.go 93.75000% <0.00000%> (-1.70455%) ⬇️
pm/stub.go 55.46218% <0.00000%> (-1.43437%) ⬇️
server/ai_mediaserver.go 6.64557% <0.00000%> (-0.04233%) ⬇️
pm/sender.go 86.79245% <0.00000%> (-11.07989%) ⬇️
server/ai_process.go 1.97300% <0.00000%> (-0.04613%) ⬇️
server/live_payment.go 54.70588% <58.06452%> (+2.14178%) ⬆️
server/remote_signer.go 61.45455% <79.71698%> (+61.45455%) ⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

j0sh added 7 commits January 28, 2026 16:34
This makes the orchestrator return the expected pricing with updated
OrchestratorInfo after payment requests.
If nil caps are passed in to the remote signer, the orch does not
return a PaymentResult-wrapped OrchestratorInfo after payment
that the remote signer can then use. (This is OK; the same
OrchestratorInfo can be reused until it expires.)

Since the payment response effectively becomes optional, let's not
mandate it be wrapped in a PaymentResult. This also makes it somewhat
easier for clients to use the OrchestratorInfo data directly from
GetOrchestrator without additional protobuf shenanigans.
Enable remote signers to check orchestrator prices against configured max
price limits (-maxPricePerUnit and -maxPricePerCapability flags). This
prevents remote signers from generating payments to orchestrators that
exceed acceptable pricing thresholds.

Changes:
- Add HTTPStatusPriceExceeded (481) response code for price rejections
- Validate orchestrator price in GenerateLivePayment() after session setup
- Add test coverage for price validation and rejection scenarios

The price check supports capability-specific pricing and returns HTTP 481
when an orchestrator's price exceeds the configured maximum, allowing the
gateway to select a different orchestrator.
Usually there are constraints in place alongside the capabilities
in order to set a per-capability price. However, if no constraints
are set, there are no per-capability prices, so return the global max
price instead.

This hasn't come up in prod (and fixes a crash) so shouldn't be a
breaking change.
@j0sh
Copy link
Collaborator Author

j0sh commented Jan 31, 2026

Couple small behavioral changes since the original draft PR:

  • Set the price when the session starts and use that initial price for the duration of the session, regardless if the O tries to increase the ticket value. This matches how the current code handles it. Orchs should fix the price anyway but this behavior prevents any shenanigans.
  • Optionally pass in capabilities to the remote signer. Prepare to have your eyes glaze over: if capabilities are given, the signer embeds the capabilities in the seg creds, and the orch produces a capability-specific OrchestratorInfo update after processing a payment. This updated OrchestratorInfo can then be used for the next remote signer call. This helps avoid a refresh every now and then. If no caps are given, the orchestrator does not produce complete pricing info (this might be a bug on the orch but I'm not sure yet). In that case, the payment response can be mostly ignored and the gateway can keep reusing its original GetOrchestrator-acquired OrchestratorInfo until that needs to be refreshed via another GetOrchestrator call ... with capabilities. TBH all this smells kind of weird and is subject to change.
  • Now, because of this thing with capabilities, along with the initial price-fixing, there is even less reason to use the response from the orchestrator's payment endpoint. So remove the PaymentResult wrapper from the remote signer request and just use the OrchestratorInfo directly. This should help SDKs since they can pass along the GetOrchestrator response directly without additional protobuf munging.
  • Cap the number of tickets to 100. This is both a financial and an availability guard; generating millions of tickets actually locks up the server due to all the ECDSA signing. Ask me how I know. With a correctly configured node there should only 1 or 2 tickets per remote signer call.

@j0sh j0sh marked this pull request as ready for review January 31, 2026 04:20
@j0sh j0sh merged commit f117b44 into master Jan 31, 2026
18 checks passed
@j0sh j0sh deleted the ja/remote-signer-tickets branch January 31, 2026 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Issues and PR related to the AI-video branch. go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants