Skip to content

AlvaroRausell/subtitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sub - Adversarial Subtitle Translator

A Go CLI tool for translating SRT subtitle files using an adversarial multi-agent LLM approach. Three AI agents work together to produce high-quality translations:

  1. Chunker Agent: Groups subtitle entries into semantic batches that make sense to translate together (keeping split sentences intact)
  2. Translator Agent: Translates each chunk to the target language
  3. Reviewer Agent: Evaluates translations and provides feedback for improvement

The translator and reviewer engage in an adversarial loop: if the reviewer rejects a translation, it provides specific feedback, and the translator tries again (up to N attempts). This ensures translation quality while maintaining the original tone, including profanity, idioms, and emotional register.

Features

  • Multi-agent adversarial translation with feedback loops
  • Semantic chunking to keep related dialogue together
  • Tone preservation (profanity, slang, idioms translated faithfully)
  • Parallel processing with configurable worker count
  • Glossary support for consistent terminology
  • Progress bars and colored output
  • Verbose modes for debugging
  • Job system for manually resolving difficult translations
  • Support for OpenRouter and Groq LLM providers

Installation

# Clone the repository
git clone https://github.com/user/sub.git
cd sub

# Build the binary
go build -o sub ./cmd/sub

# Optionally, move to your PATH
mv sub /usr/local/bin/

Quick Start

# 1. Initialize configuration
sub init

# 2. Set your API key (choose one)
export OPENROUTER_API_KEY=your-openrouter-key
# or
export GROQ_API_KEY=your-groq-key

# 3. Translate a subtitle file
sub translate movie.srt -l Spanish

Commands

sub init

Creates the configuration file at ~/.subrc.

sub init

sub translate

Translates an SRT subtitle file to the target language.

sub translate <input.srt> -l <language> [flags]

Flags:

Flag Description Default
-l, --language Target language (required) -
-o, --output Output file path <input>_<language>.srt
-g, --glossary Glossary name or path -
-j, --jobs Number of parallel workers 1
-v, --verbose Verbose output (use twice for full prompts) -

Examples:

# Basic translation
sub translate movie.srt -l Spanish

# With custom output path
sub translate movie.srt -l French -o movie_fr.srt

# With glossary and parallel processing
sub translate anime.srt -l English -g anime -j 4

# With verbose output
sub translate movie.srt -l German -v    # Condensed output
sub translate movie.srt -l German -vv   # Full prompts/responses

sub review

Interactively review and fix translations that couldn't be resolved automatically.

sub review <job-id>

When a translation job has unresolved chunks (exceeded max attempts without reviewer approval), it saves a job file. Use this command to manually provide translations.

Interactive commands during review:

  • Enter translation text for each line
  • Type skip to skip a chunk
  • Type quit to save progress and exit
  • Press Enter with empty input to use the last attempt

sub jobs

List pending review jobs.

sub jobs

sub jobs clean

Delete all pending review jobs.

sub jobs clean

sub glossary init

Create a new glossary template.

sub glossary init <name>

Creates a YAML file at ~/.sub/glossaries/<name>.yaml.

sub glossary list

List available glossaries.

sub glossary list

Configuration

The configuration file is located at ~/.subrc. It is created when you run sub init.

# API Keys (can also use env vars: OPENROUTER_API_KEY, GROQ_API_KEY)
openrouter_api_key: ""
groq_api_key: ""

# Global provider: "openrouter" or "groq"
provider: "openrouter"

# Models for each agent
# OpenRouter: anthropic/claude-3.5-sonnet, openai/gpt-4o, meta-llama/llama-3.1-70b-instruct
# Groq: llama-3.1-70b-versatile, llama-3.1-8b-instant, mixtral-8x7b-32768
models:
  chunker: "anthropic/claude-3-haiku"
  translator: "anthropic/claude-3.5-sonnet"
  reviewer: "anthropic/claude-3.5-sonnet"

# Adversarial settings
max_attempts: 3           # Max translation attempts per chunk

# API retry settings
max_retries: 3
retry_delay_ms: 1000      # Base delay with exponential backoff

# Chunking settings
max_chunk_size: 20        # Max subtitles per chunk

# Default parallelism
default_jobs: 1

Environment Variables

Variable Description
OPENROUTER_API_KEY API key for OpenRouter
GROQ_API_KEY API key for Groq
SUB_PROVIDER Override provider setting

Environment variables take precedence over config file values.

Glossaries

Glossaries provide translation guidance for specific terms. They are not exact find-and-replace rules; instead, the LLM uses them as context for making translation decisions.

Example glossary (~/.sub/glossaries/anime.yaml):

Nakama: "Translate as 'crew' or 'companions' depending on context"
Shinigami: "Use 'Soul Reaper' consistently"
honorifics: "Keep Japanese honorifics (-san, -kun, -chan) as-is"
Baka: "Translate as 'idiot' or 'fool' based on emotional intensity"
senpai: "Keep as 'senpai' - do not translate"

Usage:

# By name (looks in ~/.sub/glossaries/)
sub translate anime.srt -l English -g anime

# By path
sub translate anime.srt -l English -g /path/to/custom.yaml

Translation Quality

The adversarial approach helps ensure quality translations:

  1. Chunker groups related subtitles together, ensuring split sentences ("This is..." / "...my home") are translated as a unit

  2. Translator follows strict guidelines:

    • Preserve original tone (including profanity and vulgar language)
    • Translate idioms to culturally equivalent expressions
    • Maintain emotional register (casual, formal, angry, etc.)
    • Keep formatting tags (<i>, <b>) intact
    • Follow glossary guidance
  3. Reviewer checks for:

    • Accuracy of meaning
    • Natural flow in target language
    • Tone preservation (rejects if softened/censored)
    • Glossary adherence
    • Proper handling of split sentences

If the reviewer rejects a translation, it provides specific feedback. The translator uses this feedback to improve the next attempt.

Handling Unresolved Translations

If a chunk exceeds the maximum attempts without approval:

  1. The job is saved to ~/.sub/jobs/<job-id>.yaml
  2. A warning is printed with the job ID
  3. The output file is still written (with best-effort translations)

To resolve manually:

# List pending jobs
sub jobs

# Review a specific job
sub review abc12

# Clean up all jobs
sub jobs clean

Project Structure

sub/
├── cmd/sub/main.go           # CLI entry point
├── internal/
│   ├── agents/               # LLM agents (chunker, translator, reviewer)
│   ├── config/               # Configuration loading
│   ├── glossary/             # Glossary management
│   ├── jobs/                 # Job persistence
│   ├── llm/                  # LLM provider clients
│   ├── pipeline/             # Translation orchestration
│   ├── srt/                  # SRT file parsing
│   └── ui/                   # Terminal output formatting
├── go.mod
└── go.sum

License

MIT License

About

Subtitle translator using LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages