Conversation
Fix an occassional DTS overlap by
closing the filtergraph after each
segment and re-creating it at the
beginning of each segment, instead
of attempting to persist the
filtergraph in between segments.
This overlap occurred mostly when
flip-flopping segments between transcoders,
or processing non-consecutive segments within
a single transcoder. This was due to drift in
adjusting input timestamps to match the fps
filter's expectation of mostly consecutive
timestamps while adjusting output timestamps
to remove accumulated delay from the filter.
There is roughly a 1% performance hit on my
machine from re-creating the filtergraph.
Because we are now resetting the filter after
each segment, we can remove a good chunk of
the special-cased timestamp handling code
before and after the filtergraph since
we no longer need to handle discontinuities
between segments.
However, we do need to keep some filter flushing
logic in order to accommodate low-fps or low-frame
content.
This does change our outputs, usually by one
fewer frame. Sometimes we seem to produce an
*additional* frame - it is unclear why. However,
as the test cases note, this actually clears up a
numer of long-standing oddities around the expected
frame count, so it should be seen as an improvement.
---
It is important to note that while this fixes DTS
overlap in a (rather unpredictable) general case,
there is another overlap bug in one very specific case.
These are the conditions for bug:
1. First and second segments of the stream are being
processed. This could be the same transcoder or
different ones.
2. The first segment starts at or near zero pts
3. mpegts is the output format
4. B-frames are being used
What happens is we may see DTS < PTS for the
very first frames in the very first segment,
potentially starting with PTS = 0, DTS < 0.
This is expected for B-frames.
However, if mpegts is in use, it cannot take negative
timestamps. To accompdate negative DTS, the muxer
will set PTS = -DTS, DTS = 0 and delay (offset) the
rest of the packets in the segment accordingly.
Unfortunately, subsequent transcodes will not know
about this delay! This typically leads to an overlap
between the first and second segments (but segments after
that would be fine).
The normal way to fix this would be to add a constant delay
to all segments - ffmpeg adds 1.4s to mpegts by default.
However, introducing a delay right now feels a little
odd since we don't really offer any other knobs to control
the timestamp (re-transcodes would accumulate the delay) and
there is some concern about falling out of sync with the
source segment since we have historically tried to make
timestamps follow the source as closely as possible.
So we're leaving this particular bug as-is for now.
There is some commented-out code that adds this delay
in case we feel that we would need it in the future.
Note that FFmpeg CLI also has the exact same problem
when the muxer delay is removed, so this is not a
LPMS-specific issue. This is exercised in the test cases.
Example of non-monotonic DTS after encoding and after muxing:
Segment.Frame | Encoder DTS | Encoder PTS | Muxer DTS | Muxer PTS
--------------|-------------|-------------|-----------|-----------
1.1 | -20 | 0 | 0 | 20
1.2 | -10 | 10 | 10 | 30
1.3 | 0 | 20 | *20* | 40
1.4 | 10 | 30 | *30* | 50
2.1 | 20 | 40 | *20* | 40
2.2 | 30 | 50 | *30* | 50
2.3 | 40 | 60 | 40 | 60
Collaborator
Author
|
Ran a test with a go-livepeer B+O with around 12 hours' worth of customer segments being sent in alternating order, checked timestamps for all, and and things check out OK. CI is green. Nvidia unit tests don't run in CI but pass locally. Merging this so we can cut a release and get the fix into the wild. |
j0sh
added a commit
to livepeer/go-livepeer
that referenced
this pull request
Mar 10, 2026
Pulls in a number of fixes and mitigations in LPMS for out-of-order timestamps, runaway encodes, etc. livepeer/lpms#445 (#3863)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix DTS overlap between segments and handle
AV_NOPTS_VALUEinputs that couldcause runaway encodes with millions of output frames.
Supersedes #423 and incorporates fixes for #443. Also addresses the root cause
behind the
AV_NOPTS_VALUE-related issues discussed in #440.Problem
Three interrelated timestamp problems in the transcoding pipeline:
DTS overlap between segments. When segments are processed out of order
(e.g. flip-flopping between transcoders) or non-consecutively, drift
accumulated from adjusting timestamps around the FPS filter could produce
non-monotonic DTS across segment boundaries. This was unpredictable and
depended on the segment processing order.
AV_NOPTS_VALUEcausing runaway encodes. Certain inputs containmisplaced SEI NAL units (after picture data instead of before), which
causes the ffmpeg H.264 parser to omit timing information, producing frames
with
AV_NOPTS_VALUEPTS. When a segment is skipped (e.g. segment 1 → 3),this could have led to PTS underflows in the FPS filter, generating
millions of duplicate frames and filling disk.
Unclamped timestamps reaching the muxer and encoder. The decoder may
still produce out-of-order PTS for pathological inputs. Without clamping,
these regressive timestamps could reach the muxer or encoder, producing
undefined behavior.
Changes
Filtergraph reset per segment (
filter.c,encoder.c,transcoder.c)Close and recreate the video filtergraph at the start of each segment instead
of persisting it across segments. This eliminates the drift from adjusting
input timestamps to satisfy the FPS filter's monotonicity requirement. The
performance cost is roughly 1% from filtergraph re-creation.
Because the filter resets each segment, a large amount of special-case
timestamp handling code for inter-segment discontinuities has been removed.
Some flush logic is retained for low-fps or very short content.
Decoded PTS repair (
decoder.c)Add
fix_video_pts()at the decode stage to normalize video frame PTS beforeit reaches the filter or encoder:
best_effort_timestampwhen PTS isAV_NOPTS_VALUETimestamp clamping at muxer and encoder (
encoder.c)DTS > PTSrepair path, matching ffmpeg's own mux behaviorhandling cases where timebase conversion collapses adjacent frame PTS into
the same encoder tick
SEI NAL reordering (
sei_fixup.go)Add a Go-side pre-processing step (
FixMisplacedSEI) for H.264 mpegts inputsthat detects and reorders misplaced SEI NAL units (found after VCL NALs) to
appear before them. This restores the H.264 parser's ability to extract
timing information, eliminating the
AV_NOPTS_VALUEframes at their source.Runaway encode guard (
encoder.c,transcoder.c)Abort video encoding when output frames exceed 25x decoded frames
(
lpms_ERR_ENC_RUNAWAY), providing a safety net against the FPS filterexploding frame counts even after the other fixes. Image-sequence inputs
(
image2) are exempted since frame expansion is expected there. (From #443)Known Limitation
A DTS overlap can still occur between the first and second segments when all
of the following are true:
The mpegts muxer offsets all packets to compensate for negative DTS, but
subsequent segment transcodes are unaware of this offset. This is the same
behavior as FFmpeg CLI when
muxdelayis set to 0, and is codified inTestTranscoder_API_DTSOverlap. Adding a constant delay (as FFmpeg does withits default 1.4s
muxdelay) would fix this, but is deferred to avoidaccumulating delays across re-transcodes.
Test Plan
TestTranscoder_API_DTSOverlap— verifies DTS monotonicity acrossout-of-order segments (and documents the known B-frame/mpegts edge case)
TestTranscoder_NOPTS_SkipSegment— transcodes real-world samples withmisplaced SEI and skipped segments that previously produced runaway output
TestTranscoder_NOPTS_MissingSEIAndPES— verifies PTS synthesis forinputs where the H.264 parser cannot produce any timestamps
TestTranscoder_EncodedFrameRunaway— triggers the 25x frame count abortTestFixMisplacedSEI_BrokenFiles/TestFixMisplacedSEI_NoChanges—unit tests for SEI reordering on affected and unaffected samples