feat: add yt-dlp agent for YouTube video/audio downloads#232
feat: add yt-dlp agent for YouTube video/audio downloads#232marcusquinn merged 2 commits intomainfrom
Conversation
…le conversion
Add a new yt-dlp subagent with helper script supporting:
- Video download (single, playlist, channel) with format selection
- Audio extraction (MP3, M4A, Opus, WAV, FLAC)
- Transcript/subtitle download (SRT conversion)
- Local video file audio extraction via ffmpeg
- SponsorBlock integration, download archive, metadata embedding
- Auto-install of yt-dlp + ffmpeg dependencies
- Organized output to ~/Downloads/yt-dlp-{type}-{name}-{timestamp}/
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughA new YouTube downloader helper script (.agent/scripts/yt-dlp-helper.sh) is introduced with command suite for downloading videos, audio, playlists, channels, transcripts, and converting local files. Supporting documentation and configuration index updates enable full integration into the DevOps framework. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Script as yt-dlp-helper.sh
participant Validator as Environment<br/>Checks
participant Analyzer as URL Analysis<br/>& Parsing
participant Executor as yt-dlp<br/>Tool
participant Encoder as ffmpeg<br/>(optional)
participant FileSystem as Output<br/>Directory
User->>Script: Call with command & URL
Script->>Validator: Check yt-dlp & ffmpeg
alt missing dependencies
Validator-->>Script: Error + installation guidance
Script-->>User: Exit with error code
end
Validator-->>Script: Ready
Script->>Analyzer: Detect URL type & extract title
Analyzer-->>Script: Type, title, safe naming
Script->>Script: Parse options & resolve format
Script->>Script: Build output directory with timestamp
alt Command: video/audio/playlist/channel
Script->>Executor: Invoke yt-dlp with args
Executor-->>FileSystem: Write media file(s)
else Command: convert (local)
Script->>Encoder: Run ffmpeg conversion
Encoder-->>FileSystem: Write converted file(s)
else Command: transcript
Script->>Executor: Download subtitles/transcripts only
Executor-->>FileSystem: Write transcript file(s)
else Command: info
Script->>Executor: Fetch metadata JSON
Executor-->>Script: Return info
Script-->>User: Display parsed output
end
FileSystem-->>Script: File(s) written
Script-->>User: Success message + output location
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes The review requires careful attention to shell script security (variable quoting, injection risks), validation of all 28+ functions with diverse logic patterns, format resolution mappings, yt-dlp/ffmpeg argument construction correctness, and error handling coverage across 10+ distinct command implementations. Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🤖 Augment PR SummarySummary: Adds a new Changes:
Technical Notes:
🤖 Was this summary useful? React with 👍 or 👎 |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Mon Jan 26 00:43:28 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
| return 1 | ||
| ;; | ||
| esac | ||
| return 0 |
There was a problem hiding this comment.
main() always returns 0 and the script ends with exit 0, so callers can’t detect failures from download_*/show_info/etc. Consider propagating the invoked command’s exit status so CI/automation can reliably fail on errors.
Other Locations
.agent/scripts/yt-dlp-helper.sh:1026
🤖 Was this useful? React with 👍 or 👎
| print(f\"Channel: {data.get('channel', data.get('uploader', 'N/A'))}\") | ||
| print(f\"Duration: {data.get('duration_string', 'N/A')}\") | ||
| print(f\"Upload date: {data.get('upload_date', 'N/A')}\") | ||
| print(f\"View count: {data.get('view_count', 'N/A'):,}\") |
| --write-subs \ | ||
| --sub-langs "$SUB_LANGS" \ | ||
| --convert-subs srt \ | ||
| --write-info-json \ |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive and well-structured yt-dlp helper script, along with excellent documentation. My review focuses on improving the robustness and maintainability of the bash script. I've identified a high-severity bug concerning the handling of file paths with spaces, a medium-severity bug in the Python snippet for displaying video information, and a medium-severity opportunity to reduce code duplication in the main command dispatch logic. The proposed changes will make the script more resilient and easier to maintain.
| build_common_args() { | ||
| local args=() | ||
|
|
||
| # Metadata | ||
| if [[ "$NO_METADATA" != true ]]; then | ||
| args+=(--embed-metadata --embed-chapters --embed-thumbnail) | ||
| fi | ||
|
|
||
| # Info JSON | ||
| if [[ "$NO_INFO_JSON" != true ]]; then | ||
| args+=(--write-info-json) | ||
| fi | ||
|
|
||
| # SponsorBlock | ||
| if [[ "$NO_SPONSORBLOCK" != true ]]; then | ||
| args+=(--sponsorblock-remove sponsor) | ||
| fi | ||
|
|
||
| # Download archive | ||
| if [[ "$USE_ARCHIVE" == true ]]; then | ||
| mkdir -p "$CONFIG_DIR" | ||
| args+=(--download-archive "$ARCHIVE_FILE") | ||
| fi | ||
|
|
||
| # Rate limiting | ||
| if [[ "$NO_SLEEP" != true ]]; then | ||
| args+=(--sleep-interval 1 --max-sleep-interval 5) | ||
| fi | ||
|
|
||
| # Cookies | ||
| if [[ "$USE_COOKIES" == true ]]; then | ||
| args+=(--cookies-from-browser chrome) | ||
| fi | ||
|
|
||
| # Error handling | ||
| args+=(--ignore-errors --no-overwrites --continue) | ||
|
|
||
| echo "${args[@]}" | ||
| return 0 | ||
| } |
There was a problem hiding this comment.
The current implementation of build_common_args returns a string of arguments, which is then used unquoted in the calling functions (e.g., download_video). This approach is vulnerable to word-splitting issues and will fail if file paths, such as $CONFIG_DIR or $ARCHIVE_FILE, contain spaces (e.g., if $HOME is /Users/John Doe).
To fix this, I recommend modifying build_common_args to populate a global array with the arguments. The calling functions can then use this array with proper quoting ("${ARRAY[@]}") to ensure arguments are passed correctly, even if they contain spaces. This change is necessary in all download_* functions that use build_common_args.
# Build common yt-dlp arguments
# Populates the global COMMON_ARGS array.
build_common_args() {
COMMON_ARGS=()
# Metadata
if [[ "$NO_METADATA" != true ]]; then
COMMON_ARGS+=(--embed-metadata --embed-chapters --embed-thumbnail)
fi
# Info JSON
if [[ "$NO_INFO_JSON" != true ]]; then
COMMON_ARGS+=(--write-info-json)
fi
# SponsorBlock
if [[ "$NO_SPONSORBLOCK" != true ]]; then
COMMON_ARGS+=(--sponsorblock-remove sponsor)
fi
# Download archive
if [[ "$USE_ARCHIVE" == true ]]; then
mkdir -p "$CONFIG_DIR"
COMMON_ARGS+=(--download-archive "$ARCHIVE_FILE")
fi
# Rate limiting
if [[ "$NO_SLEEP" != true ]]; then
COMMON_ARGS+=(--sleep-interval 1 --max-sleep-interval 5)
fi
# Cookies
if [[ "$USE_COOKIES" == true ]]; then
COMMON_ARGS+=(--cookies-from-browser chrome)
fi
# Error handling
COMMON_ARGS+=(--ignore-errors --no-overwrites --continue)
}| "$url" 2>/dev/null | python3 -c " | ||
| import json, sys | ||
| try: | ||
| data = json.load(sys.stdin) | ||
| print(f\"Title: {data.get('title', 'N/A')}\") | ||
| print(f\"Channel: {data.get('channel', data.get('uploader', 'N/A'))}\") | ||
| print(f\"Duration: {data.get('duration_string', 'N/A')}\") | ||
| print(f\"Upload date: {data.get('upload_date', 'N/A')}\") | ||
| print(f\"View count: {data.get('view_count', 'N/A'):,}\") | ||
| print(f\"Like count: {data.get('like_count', 'N/A')}\") | ||
| print(f\"Description: {(data.get('description', 'N/A') or 'N/A')[:200]}...\") | ||
| print() | ||
| print('Available formats:') | ||
| for f in data.get('formats', []): | ||
| res = f.get('resolution', 'N/A') | ||
| ext = f.get('ext', 'N/A') | ||
| vcodec = f.get('vcodec', 'none') | ||
| acodec = f.get('acodec', 'none') | ||
| filesize = f.get('filesize') or f.get('filesize_approx') | ||
| size_str = f'{filesize / 1024 / 1024:.1f}MB' if filesize else 'N/A' | ||
| if vcodec != 'none' or acodec != 'none': | ||
| print(f' {f.get(\"format_id\", \"?\"): <10} {res: <12} {ext: <6} v:{vcodec[:10]: <10} a:{acodec[:10]: <10} {size_str}') | ||
| except Exception as e: | ||
| print(f'Error parsing info: {e}', file=sys.stderr) | ||
| " |
There was a problem hiding this comment.
The Python script used to display video information has a bug. The line print(f\"View count: {data.get('view_count', 'N/A'):,}\") will raise a ValueError if view_count is not present in the JSON, because data.get() will return 'N/A' and the format specifier :, cannot be used with a string. Additionally, the like_count is not formatted with a comma, which is inconsistent with view_count.
I suggest refactoring this part of the Python script to use a helper function that safely formats the numbers, adding commas for thousands separation, and handles missing values gracefully. This will fix the bug and improve consistency.
"$url" 2>/dev/null | python3 -c "
import json, sys
def _format_count(count):
if isinstance(count, int):
return '{:,}'.format(count)
return count if count is not None else 'N/A'
try:
data = json.load(sys.stdin)
print(f\"Title: {data.get('title', 'N/A')}\")
print(f\"Channel: {data.get('channel', data.get('uploader', 'N/A'))}\")
print(f\"Duration: {data.get('duration_string', 'N/A')}\")
print(f\"Upload date: {data.get('upload_date', 'N/A')}\")
print(f\"View count: {_format_count(data.get('view_count'))}\")
print(f\"Like count: {_format_count(data.get('like_count'))}\")
print(f\"Description: {(data.get('description', 'N/A') or 'N/A')[:200]}...\")
print()
print('Available formats:')
for f in data.get('formats', []):
res = f.get('resolution', 'N/A')
ext = f.get('ext', 'N/A')
vcodec = f.get('vcodec', 'none')
acodec = f.get('acodec', 'none')
filesize = f.get('filesize') or f.get('filesize_approx')
size_str = f'{filesize / 1024 / 1024:.1f}MB' if filesize else 'N/A'
if vcodec != 'none' or acodec != 'none':
print(f' {f.get(\"format_id\", \"?\" ): <10} {res: <12} {ext: <6} v:{vcodec[:10]: <10} a:{acodec[:10]: <10} {size_str}')
except Exception as e:
print(f'Error parsing info: {e}', file=sys.stderr)
"| case "$command" in | ||
| "video") | ||
| if [[ -z "$url" ]]; then | ||
| print_error "URL required. Usage: yt-dlp-helper.sh video <url>" | ||
| return 1 | ||
| fi | ||
| download_video "$url" "$@" | ||
| ;; | ||
| "audio") | ||
| if [[ -z "$url" ]]; then | ||
| print_error "URL required. Usage: yt-dlp-helper.sh audio <url>" | ||
| return 1 | ||
| fi | ||
| download_audio "$url" "$@" | ||
| ;; | ||
| "playlist") | ||
| if [[ -z "$url" ]]; then | ||
| print_error "URL required. Usage: yt-dlp-helper.sh playlist <url>" | ||
| return 1 | ||
| fi | ||
| download_playlist "$url" "$@" | ||
| ;; | ||
| "channel") | ||
| if [[ -z "$url" ]]; then | ||
| print_error "URL required. Usage: yt-dlp-helper.sh channel <url>" | ||
| return 1 | ||
| fi | ||
| download_channel "$url" "$@" | ||
| ;; | ||
| "transcript") | ||
| if [[ -z "$url" ]]; then | ||
| print_error "URL required. Usage: yt-dlp-helper.sh transcript <url>" | ||
| return 1 | ||
| fi | ||
| download_transcript "$url" "$@" | ||
| ;; | ||
| "info") | ||
| if [[ -z "$url" ]]; then | ||
| print_error "URL required. Usage: yt-dlp-helper.sh info <url>" | ||
| return 1 | ||
| fi | ||
| show_info "$url" "$@" | ||
| ;; | ||
| "convert") | ||
| if [[ -z "$url" ]]; then | ||
| print_error "File/directory required. Usage: yt-dlp-helper.sh convert <path> [options]" | ||
| return 1 | ||
| fi | ||
| convert_local "$url" "$@" | ||
| ;; | ||
| "install") | ||
| install_ytdlp | ||
| ;; | ||
| "update") | ||
| update_ytdlp | ||
| ;; | ||
| "config") | ||
| generate_config | ||
| ;; | ||
| "status") | ||
| check_installation_status | ||
| ;; | ||
| "help"|"-h"|"--help"|"") | ||
| show_help | ||
| ;; | ||
| *) | ||
| print_error "$ERROR_UNKNOWN_COMMAND $command" | ||
| show_help | ||
| return 1 | ||
| ;; | ||
| esac |
There was a problem hiding this comment.
The main function contains repetitive code for validating the url argument across multiple commands (video, audio, playlist, etc.). This duplication makes the code harder to maintain, as any change to the validation logic needs to be applied in multiple places.
To improve this, you can refactor the case statement to group commands that require a URL and perform the validation once. A nested case statement can then be used to call the appropriate function. This will reduce code duplication and make the main dispatcher cleaner.
case "$command" in
"video"|"audio"|"playlist"|"channel"|"transcript"|"info")
if [[ -z "$url" ]]; then
print_error "URL required. Usage: yt-dlp-helper.sh $command <url>"
return 1
fi
case "$command" in
"video") download_video "$url" "$@" ;;
"audio") download_audio "$url" "$@" ;;
"playlist") download_playlist "$url" "$@" ;;
"channel") download_channel "$url" "$@" ;;
"transcript") download_transcript "$url" "$@" ;;
"info") show_info "$url" "$@" ;;
esac
;;
"convert")
if [[ -z "$url" ]]; then
print_error "File/directory required. Usage: yt-dlp-helper.sh convert <path> [options]"
return 1
fi
convert_local "$url" "$@"
;;
"install")
install_ytdlp
;;
"update")
update_ytdlp
;;
"config")
generate_config
;;
"status")
check_installation_status
;;
"help"|"-h"|"--help"|"")
show_help
;;
*)
print_error "$ERROR_UNKNOWN_COMMAND $command"
show_help
return 1
;;
esac- Propagate exit status from main() instead of always returning 0 - Fix view_count formatting crash when value is None/missing - Respect --no-info-json flag in transcript download command
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Mon Jan 26 00:46:30 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @.agent/scripts/yt-dlp-helper.sh:
- Around line 945-1026: The main function always returns 0 and the script exits
0, masking failures; change main so it captures and propagates the exit status
of the invoked subcommand instead of unconditionally returning 0: after each
command invocation (e.g., download_video, download_audio, download_playlist,
download_channel, download_transcript, show_info, convert_local, install_ytdlp,
update_ytdlp, generate_config, check_installation_status, show_help) capture its
exit code (e.g., save $? into a variable) and then exit with that code (or
return it from main) so failures bubble up to the calling shell/CI rather than
always succeeding.
- Around line 499-531: The download_transcript function unconditionally adds the
--write-info-json flag, ignoring a global --no-info-json option; update
download_transcript to honor the global setting by only including
--write-info-json when the user has not requested --no-info-json—check the
global option/EXTRA_ARGS (or a NO_INFO_JSON boolean if present) and
conditionally append --write-info-json to the yt-dlp invocation instead of
always passing it; reference the download_transcript function and the
EXTRA_ARGS/--write-info-json flag when making the change.
🧹 Nitpick comments (1)
.agent/tools/video/yt-dlp.md (1)
31-280: Trim inline examples; prefer pointers and file:line referencesThis doc embeds extensive command/config blocks. The guidance for
.agent/**/*.mdcalls for progressive disclosure and authoritative references to implementation rather than inline code samples.Based on learnings, replace most inline examples with short guidance plus
file:linereferences to.agent/scripts/yt-dlp-helper.shfor authoritative usage and options.
|
Task t021 has been completed. PR #252 is open and pending CI. Supervisor DB shows completion with PR link. Monitoring CI status for merge readiness. Posted by AI Supervisor (automated reasoning cycle) |
|
Acknowledged. This task (t021) has an open PR #252 in a managed private repo. PR CI is pending — will track for merge once checks pass. Posted by AI Supervisor (automated reasoning cycle) |
|
PR #252 is open for this task. CI status is PENDING. Will track through to merge once CI passes. Posted by AI Supervisor (automated reasoning cycle) |
|
Status: PR #252 is open and awaiting CI. This task replaces admin bypass with a fast-path CI workflow. Once CI passes, this should be ready for review and merge. Posted by AI Supervisor (automated reasoning cycle) |



Summary
.agent/tools/video/yt-dlp.md) for downloading YouTube video, audio, playlists, channels, and transcripts.agent/scripts/yt-dlp-helper.sh) with 12 commands: video, audio, playlist, channel, transcript, info, convert, install, update, config, status, helpsubagent-index.toonto register the new agent and scriptFeatures
video <url>audio <url>playlist <url>channel <url>transcript <url>info <url>convert <path>installupdateconfig~/.config/yt-dlp/configwith sensible defaultsstatusKey defaults
~/Downloads/yt-dlp-{type}-{name}-{timestamp}/Quality
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.