feat: support bilibili fetch for festival#24
feat: support bilibili fetch for festival#24Ovler-Young wants to merge 1 commit intofoamzou:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds fallback support for fetching Bilibili festival videos that don't have window.__playinfo__ embedded in the HTML. When the regex extraction of playinfo fails, the code now falls back to making API calls to Bilibili's web-interface and player endpoints to retrieve video metadata and streaming URLs.
Changes:
- Added API-based fallback for fetching Bilibili video metadata when HTML regex extraction fails
- Introduced two new response structures (
BilibiliWebInterfaceViewandBilibiliPlayUrlResponse) to handle API responses - Implemented
fetchBytesFromApi()function to fetch video data using Bilibili's public APIs
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| processor/bilibili/fetch_media_entity.go | Added two new struct types for API response parsing: BilibiliWebInterfaceView for metadata and BilibiliPlayUrlResponse for playback URLs |
| processor/bilibili/bilibili.go | Modified FetchMetaAndResourceInfo to fall back to API calls on regex failure; added fetchBytesFromApi function implementing the API-based retrieval logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| // audio resource | ||
| // audio resource |
There was a problem hiding this comment.
Duplicate comment "audio resource" appears on consecutive lines. Remove one of the duplicate comments.
| // audio resource |
| Initialization string `json:"Initialization"` | ||
| IndexRange string `json:"indexRange"` | ||
| } `json:"SegmentBase"` | ||
| Codecid2 int `json:"codecid"` |
There was a problem hiding this comment.
Duplicate field "Codecid" appears twice in the same struct (lines 113 and 124). The second occurrence on line 124 is named "Codecid2" but has the same JSON tag "codecid". This will cause JSON unmarshaling to use only the first field, making the second field unreachable. Consider removing the duplicate field or using a different JSON tag if both fields are needed.
| Initialization string `json:"Initialization"` | ||
| IndexRange string `json:"indexRange"` | ||
| } `json:"SegmentBase"` | ||
| Codecid2 int `json:"codecid"` |
There was a problem hiding this comment.
Duplicate field "Codecid" appears twice in the same struct (lines 132 and 143). The second occurrence on line 143 is named "Codecid2" but has the same JSON tag "codecid". This will cause JSON unmarshaling to use only the first field, making the second field unreachable. Consider removing the duplicate field or using a different JSON tag if both fields are needed.
| Initialization string `json:"Initialization"` | ||
| IndexRange string `json:"indexRange"` | ||
| } `json:"SegmentBase"` | ||
| Codecid2 int `json:"codecid"` |
There was a problem hiding this comment.
Duplicate field "Codecid" appears twice in the same struct (lines 153 and 164). The second occurrence on line 164 is named "Codecid2" but has the same JSON tag "codecid". This will cause JSON unmarshaling to use only the first field, making the second field unreachable. Consider removing the duplicate field or using a different JSON tag if both fields are needed.
| var bestAudio *struct { | ||
| Id int `json:"id"` | ||
| BaseUrl string `json:"baseUrl"` | ||
| BackupUrl []string `json:"backupUrl"` | ||
| Bandwidth int `json:"bandwidth"` | ||
| MimeType string `json:"mimeType"` | ||
| Codecid int `json:"codecid"` | ||
| Codecs string `json:"codecs"` | ||
| Width int `json:"width"` | ||
| Height int `json:"height"` | ||
| FrameRate string `json:"frameRate"` | ||
| Sar string `json:"sar"` | ||
| StartWithSap int `json:"startWithSap"` | ||
| SegmentBase struct { | ||
| Initialization string `json:"Initialization"` | ||
| IndexRange string `json:"indexRange"` | ||
| } `json:"SegmentBase"` | ||
| Codecid2 int `json:"codecid"` | ||
| } |
There was a problem hiding this comment.
The inline anonymous struct type for bestAudio (lines 171-189) duplicates the Audio struct definition from BilibiliPlayUrlResponse.Data.Dash.Audio (lines 126-144). Consider extracting this as a named type to avoid duplication and improve maintainability. For example, create a DashAudioItem type that can be reused in both places.
| if bestAudio != nil { | ||
| mediaMeta.Audios = append(mediaMeta.Audios, meta.Audio{ | ||
| Url: bestAudio.BaseUrl, | ||
| BitRate: bestAudio.Bandwidth / 1000, | ||
| }) | ||
| } |
There was a problem hiding this comment.
There's no handling for the case when playData.Data.Dash.Audio is empty. If no audio streams are available, the function will return a MediaMeta with an empty Audios slice and only video streams. Consider checking if the Audio slice is empty and either returning an error similar to the original implementation (line 76: "no audio data") or documenting that this is intentional for festival videos that may only have video streams.
| if bestAudio != nil { | |
| mediaMeta.Audios = append(mediaMeta.Audios, meta.Audio{ | |
| Url: bestAudio.BaseUrl, | |
| BitRate: bestAudio.Bandwidth / 1000, | |
| }) | |
| } | |
| if bestAudio == nil { | |
| return meta.MediaMeta{}, errors.New("no audio data") | |
| } | |
| mediaMeta.Audios = append(mediaMeta.Audios, meta.Audio{ | |
| Url: bestAudio.BaseUrl, | |
| BitRate: bestAudio.Bandwidth / 1000, | |
| }) |
| if len(metaData.Data.Staff) == 0 { | ||
| mediaMeta.Artist = metaData.Data.Owner.Name | ||
| } else { | ||
| var names []string | ||
| for _, staff := range metaData.Data.Staff { | ||
| names = append(names, staff.Name) | ||
| } | ||
| mediaMeta.Artist = strings.Join(names, ", ") | ||
| } |
There was a problem hiding this comment.
The artist extraction logic (lines 160-168) duplicates the same logic in the getSinger function (lines 238-254). Consider refactoring to reuse the getSinger function, or create a shared function that works with both AudioMeta and BilibiliWebInterfaceView structures. This would improve maintainability and ensure consistent behavior.
| if len(metaData.Data.Staff) == 0 { | |
| mediaMeta.Artist = metaData.Data.Owner.Name | |
| } else { | |
| var names []string | |
| for _, staff := range metaData.Data.Staff { | |
| names = append(names, staff.Name) | |
| } | |
| mediaMeta.Artist = strings.Join(names, ", ") | |
| } | |
| mediaMeta.Artist = getSinger(metaData.Data) |
如 https://www.bilibili.com/festival/VSF2026live?bvid=BV1zAFrzaEQz 的这些有B站主题页的视频中可能没有playinfo,故尝试使用API进行获取。
close #25