ICF-MT is a human-in-the-loop system for translating and post-editing Informed Consent Forms (ICFs).
The key features are:
- Layout-preserving translation: Upload PDF or DOCX files and get translated documents that keep the original structure.
- Interactive post-editing: Review translations side-by-side with the source text and edit in a rich-text editor.
- Entity highlighting: Medical terms, drugs, diseases, and organizations are automatically tagged and synced across both languages.
- Readability analysis: Sentence-level readability scores across the source and translated content.
- Edit tracking: Review the changes between the initial machine output and your edits.
- Export: Download finished translations as PDF or DOCX files.
- MT backends: Swap in different translation models from HuggingFace or add new language pairs.
- Evaluation: Compare model performance with automated MT metrics (
COMET,BLEU,TER) and generate MQM projects for human evaluation.
First, install the UV package manager:
macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | shWindows:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"For more installation options, visit the UV installation guide.
git clone https://github.com/josecols/icf-mt.git
cd icf-mt
uv sync- Activate the UV environment:
source .venv/bin/activate- Start the API server:
python -m api.translateThe API will be available at http://localhost:8000
- Navigate to the editor directory:
cd editor- Install dependencies:
npm install- Start the development server:
npm run devThe editor will be available at http://localhost:3000
The project includes several command-line scripts for batch processing:
Translate a document:
uv run python -m scripts.translate <form_name> <model_name> <input_file>form_name: Identifier for the ICF form file (use "example" to translate the included ICF example form).model_name: Translation model to use (e.g., "tower", "tower_plus").input_file: Path to the input document (.docx or .pdf).- Output: HTML file saved to
data/processed/directory.
Convert document formats:
uv run python -m scripts.convert <input_file>- Converts PDF or Word documents to HTML format.
- Output: HTML file saved to
data/processed/directory.
Evaluate translation quality:
uv run python -m scripts.evaluate <source_file> <reference_file> <hypothesis_file>- Compares machine translation against reference using BLEU, COMET, and other metrics.
- Output: JSON evaluation file with quality scores.
The FastAPI server provides the following endpoints for integration with the ICF Translation Editor:
POST /translate
- Upload documents for translation via streaming response.
- Accepts: Multipart file upload (.docx, .pdf).
- Returns: Server-sent events with translation progress and results.
POST /entities
- Extract named entities from text segments.
- Accepts: JSON with
sourceandtranslationarrays. - Returns: Identified entities (addresses, medical terms, organizations, etc.).
POST /readability
- Analyze readability scores for text segments.
- Accepts: JSON with
sourceandtranslationarrays . - Returns: Flesch Reading Ease (English) and Szigriszt-Pazos (Spanish) scores.
Parts of this work were completed on Hyak, UW's high-performance computing cluster. This resource was funded by the UW Student Technology Fee.
