Handle DTS overlap and AV_NOPTS_VALUE inputs by j0sh · Pull Request #445 · livepeer/lpms

j0sh · 2026-03-06T08:52:45Z

Summary

Fix DTS overlap between segments and handle AV_NOPTS_VALUE inputs that could
cause runaway encodes with millions of output frames.

Supersedes #423 and incorporates fixes for #443. Also addresses the root cause
behind the AV_NOPTS_VALUE-related issues discussed in #440.

Problem

Three interrelated timestamp problems in the transcoding pipeline:

DTS overlap between segments. When segments are processed out of order
(e.g. flip-flopping between transcoders) or non-consecutively, drift
accumulated from adjusting timestamps around the FPS filter could produce
non-monotonic DTS across segment boundaries. This was unpredictable and
depended on the segment processing order.
AV_NOPTS_VALUE causing runaway encodes. Certain inputs contain
misplaced SEI NAL units (after picture data instead of before), which
causes the ffmpeg H.264 parser to omit timing information, producing frames
with AV_NOPTS_VALUE PTS. When a segment is skipped (e.g. segment 1 → 3),
this could have led to PTS underflows in the FPS filter, generating
millions of duplicate frames and filling disk.
Unclamped timestamps reaching the muxer and encoder. The decoder may
still produce out-of-order PTS for pathological inputs. Without clamping,
these regressive timestamps could reach the muxer or encoder, producing
undefined behavior.

Changes

Filtergraph reset per segment (`filter.c`, `encoder.c`, `transcoder.c`)

Close and recreate the video filtergraph at the start of each segment instead
of persisting it across segments. This eliminates the drift from adjusting
input timestamps to satisfy the FPS filter's monotonicity requirement. The
performance cost is roughly 1% from filtergraph re-creation.

Because the filter resets each segment, a large amount of special-case
timestamp handling code for inter-segment discontinuities has been removed.
Some flush logic is retained for low-fps or very short content.

Decoded PTS repair (`decoder.c`)

Add fix_video_pts() at the decode stage to normalize video frame PTS before
it reaches the filter or encoder:

Falls back to best_effort_timestamp when PTS is AV_NOPTS_VALUE
Synthesizes PTS from frame duration + last PTS when both are missing
Clamps regressive PTS to be strictly monotonic

Timestamp clamping at muxer and encoder (`encoder.c`)

Clamp non-monotonic video DTS in the muxer independently of the existing
DTS > PTS repair path, matching ffmpeg's own mux behavior
Add a post-rescale tie-break guard for video PTS going into the encoder,
handling cases where timebase conversion collapses adjacent frame PTS into
the same encoder tick

SEI NAL reordering (`sei_fixup.go`)

Add a Go-side pre-processing step (FixMisplacedSEI) for H.264 mpegts inputs
that detects and reorders misplaced SEI NAL units (found after VCL NALs) to
appear before them. This restores the H.264 parser's ability to extract
timing information, eliminating the AV_NOPTS_VALUE frames at their source.

Runaway encode guard (`encoder.c`, `transcoder.c`)

Abort video encoding when output frames exceed 25x decoded frames
(lpms_ERR_ENC_RUNAWAY), providing a safety net against the FPS filter
exploding frame counts even after the other fixes. Image-sequence inputs
(image2) are exempted since frame expansion is expected there. (From #443)

Known Limitation

A DTS overlap can still occur between the first and second segments when all
of the following are true:

The first segment starts at or near PTS = 0
B-frames are in use (producing DTS < 0)
mpegts is the output format (cannot represent negative timestamps)

The mpegts muxer offsets all packets to compensate for negative DTS, but
subsequent segment transcodes are unaware of this offset. This is the same
behavior as FFmpeg CLI when muxdelay is set to 0, and is codified in
TestTranscoder_API_DTSOverlap. Adding a constant delay (as FFmpeg does with
its default 1.4s muxdelay) would fix this, but is deferred to avoid
accumulating delays across re-transcodes.

Test Plan

TestTranscoder_API_DTSOverlap — verifies DTS monotonicity across
out-of-order segments (and documents the known B-frame/mpegts edge case)
TestTranscoder_NOPTS_SkipSegment — transcodes real-world samples with
misplaced SEI and skipped segments that previously produced runaway output
TestTranscoder_NOPTS_MissingSEIAndPES — verifies PTS synthesis for
inputs where the H.264 parser cannot produce any timestamps
TestTranscoder_EncodedFrameRunaway — triggers the 25x frame count abort
TestFixMisplacedSEI_BrokenFiles / TestFixMisplacedSEI_NoChanges —
unit tests for SEI reordering on affected and unaffected samples

Fix an occassional DTS overlap by closing the filtergraph after each segment and re-creating it at the beginning of each segment, instead of attempting to persist the filtergraph in between segments. This overlap occurred mostly when flip-flopping segments between transcoders, or processing non-consecutive segments within a single transcoder. This was due to drift in adjusting input timestamps to match the fps filter's expectation of mostly consecutive timestamps while adjusting output timestamps to remove accumulated delay from the filter. There is roughly a 1% performance hit on my machine from re-creating the filtergraph. Because we are now resetting the filter after each segment, we can remove a good chunk of the special-cased timestamp handling code before and after the filtergraph since we no longer need to handle discontinuities between segments. However, we do need to keep some filter flushing logic in order to accommodate low-fps or low-frame content. This does change our outputs, usually by one fewer frame. Sometimes we seem to produce an *additional* frame - it is unclear why. However, as the test cases note, this actually clears up a numer of long-standing oddities around the expected frame count, so it should be seen as an improvement. --- It is important to note that while this fixes DTS overlap in a (rather unpredictable) general case, there is another overlap bug in one very specific case. These are the conditions for bug: 1. First and second segments of the stream are being processed. This could be the same transcoder or different ones. 2. The first segment starts at or near zero pts 3. mpegts is the output format 4. B-frames are being used What happens is we may see DTS < PTS for the very first frames in the very first segment, potentially starting with PTS = 0, DTS < 0. This is expected for B-frames. However, if mpegts is in use, it cannot take negative timestamps. To accompdate negative DTS, the muxer will set PTS = -DTS, DTS = 0 and delay (offset) the rest of the packets in the segment accordingly. Unfortunately, subsequent transcodes will not know about this delay! This typically leads to an overlap between the first and second segments (but segments after that would be fine). The normal way to fix this would be to add a constant delay to all segments - ffmpeg adds 1.4s to mpegts by default. However, introducing a delay right now feels a little odd since we don't really offer any other knobs to control the timestamp (re-transcodes would accumulate the delay) and there is some concern about falling out of sync with the source segment since we have historically tried to make timestamps follow the source as closely as possible. So we're leaving this particular bug as-is for now. There is some commented-out code that adds this delay in case we feel that we would need it in the future. Note that FFmpeg CLI also has the exact same problem when the muxer delay is removed, so this is not a LPMS-specific issue. This is exercised in the test cases. Example of non-monotonic DTS after encoding and after muxing: Segment.Frame | Encoder DTS | Encoder PTS | Muxer DTS | Muxer PTS --------------|-------------|-------------|-----------|----------- 1.1 | -20 | 0 | 0 | 20 1.2 | -10 | 10 | 10 | 30 1.3 | 0 | 20 | *20* | 40 1.4 | 10 | 30 | *30* | 50 2.1 | 20 | 40 | *20* | 40 2.2 | 30 | 50 | *30* | 50 2.3 | 40 | 60 | 40 | 60

j0sh · 2026-03-10T04:05:11Z

Ran a test with a go-livepeer B+O with around 12 hours' worth of customer segments being sent in alternating order, checked timestamps for all, and and things check out OK.

CI is green. Nvidia unit tests don't run in CI but pass locally.

Merging this so we can cut a release and get the fix into the wild.

Pulls in a number of fixes and mitigations in LPMS for out-of-order timestamps, runaway encodes, etc. livepeer/lpms#445 (#3863)

j0sh force-pushed the ja/nopts-fixes branch from f759360 to 7d8c287 Compare March 9, 2026 21:20

j0sh added 6 commits March 10, 2026 01:13

Handle AV_NOPTS_VALUE inputs

6eb7e5e

Add unit tests for inputs with AV_NOPTS_VALUE

e019d82

Additional NOPTS test case

d961273

Clamp timestamps going into muxer and encoder

bdff08d

Fix up SEI after picture data

c44af72

j0sh force-pushed the ja/nopts-fixes branch from ecbd0c2 to c44af72 Compare March 10, 2026 01:19

j0sh marked this pull request as ready for review March 10, 2026 01:19

j0sh added a commit to livepeer/go-livepeer that referenced this pull request Mar 10, 2026

Update LPMS to livepeer/lpms#445

f0e556f

j0sh requested review from leszko and mjh1 March 10, 2026 04:00

j0sh merged commit c44af72 into master Mar 10, 2026
3 checks passed

j0sh deleted the ja/nopts-fixes branch March 10, 2026 04:05

j0sh mentioned this pull request Mar 10, 2026

Update LPMS to https://github.com/livepeer/lpms/pull/445 livepeer/go-livepeer#3863

Merged

j0sh added a commit to livepeer/go-livepeer that referenced this pull request Mar 10, 2026

Update LPMS to c44af72

c1b70eb

Pulls in a number of fixes and mitigations in LPMS for out-of-order timestamps, runaway encodes, etc. livepeer/lpms#445 (#3863)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle DTS overlap and AV_NOPTS_VALUE inputs#445

Handle DTS overlap and AV_NOPTS_VALUE inputs#445
j0sh merged 6 commits intomasterfrom
ja/nopts-fixes

j0sh commented Mar 6, 2026 •

edited

Loading

Uh oh!

j0sh commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

j0sh commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Filtergraph reset per segment (filter.c, encoder.c, transcoder.c)

Decoded PTS repair (decoder.c)

Timestamp clamping at muxer and encoder (encoder.c)

SEI NAL reordering (sei_fixup.go)

Runaway encode guard (encoder.c, transcoder.c)

Known Limitation

Test Plan

Uh oh!

j0sh commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

j0sh commented Mar 6, 2026 •

edited

Loading

Filtergraph reset per segment (`filter.c`, `encoder.c`, `transcoder.c`)

Decoded PTS repair (`decoder.c`)

Timestamp clamping at muxer and encoder (`encoder.c`)

SEI NAL reordering (`sei_fixup.go`)

Runaway encode guard (`encoder.c`, `transcoder.c`)