Skip to content

josecols/icf-mt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python 3.12+ License: Apache 2.0 UV FastAPI Next.js

ICF-MT


ICF-MT is a human-in-the-loop system for translating and post-editing Informed Consent Forms (ICFs).

The key features are:

  • Layout-preserving translation: Upload PDF or DOCX files and get translated documents that keep the original structure.
  • Interactive post-editing: Review translations side-by-side with the source text and edit in a rich-text editor.
  • Entity highlighting: Medical terms, drugs, diseases, and organizations are automatically tagged and synced across both languages.
  • Readability analysis: Sentence-level readability scores across the source and translated content.
  • Edit tracking: Review the changes between the initial machine output and your edits.
  • Export: Download finished translations as PDF or DOCX files.
  • MT backends: Swap in different translation models from HuggingFace or add new language pairs.
  • Evaluation: Compare model performance with automated MT metrics (COMET, BLEU, TER) and generate MQM projects for human evaluation.

Installation

1. Prerequisites

First, install the UV package manager:

macOS/Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

For more installation options, visit the UV installation guide.

2. Installation from source

git clone https://github.com/josecols/icf-mt.git
cd icf-mt
uv sync

Getting Started

API Server

  1. Activate the UV environment:
source .venv/bin/activate
  1. Start the API server:
python -m api.translate

The API will be available at http://localhost:8000

Web Editor

  1. Navigate to the editor directory:
cd editor
  1. Install dependencies:
npm install
  1. Start the development server:
npm run dev

The editor will be available at http://localhost:3000

Usage

Scripts

The project includes several command-line scripts for batch processing:

Translate a document:

uv run python -m scripts.translate <form_name> <model_name> <input_file>
  • form_name: Identifier for the ICF form file (use "example" to translate the included ICF example form).
  • model_name: Translation model to use (e.g., "tower", "tower_plus").
  • input_file: Path to the input document (.docx or .pdf).
  • Output: HTML file saved to data/processed/ directory.

Convert document formats:

uv run python -m scripts.convert <input_file>
  • Converts PDF or Word documents to HTML format.
  • Output: HTML file saved to data/processed/ directory.

Evaluate translation quality:

uv run python -m scripts.evaluate <source_file> <reference_file> <hypothesis_file>
  • Compares machine translation against reference using BLEU, COMET, and other metrics.
  • Output: JSON evaluation file with quality scores.

API Endpoints

The FastAPI server provides the following endpoints for integration with the ICF Translation Editor:

POST /translate

  • Upload documents for translation via streaming response.
  • Accepts: Multipart file upload (.docx, .pdf).
  • Returns: Server-sent events with translation progress and results.

POST /entities

  • Extract named entities from text segments.
  • Accepts: JSON with source and translation arrays.
  • Returns: Identified entities (addresses, medical terms, organizations, etc.).

POST /readability

  • Analyze readability scores for text segments.
  • Accepts: JSON with source and translation arrays .
  • Returns: Flesch Reading Ease (English) and Szigriszt-Pazos (Spanish) scores.

Acknowledgments

Parts of this work were completed on Hyak, UW's high-performance computing cluster. This resource was funded by the UW Student Technology Fee.

About

A Human-in-the-Loop System for Translating Informed Consent Forms

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors