Ottoman NER

A focused toolkit for Ottoman Turkish Named Entity Recognition

About

Ottoman NER is a specialized Python package for Named Entity Recognition (NER) in Ottoman Turkish texts. This package provides a clean, modern interface for training, evaluating, and using NER models specifically designed for historical Ottoman Turkish documents.

Key Features

Focused NER Solution: Dedicated solely to Ottoman Turkish named entity recognition
Simple API: Single class interface for all NER operations
Easy Training: Train custom models with JSON configuration
Pretrained Weights: Official model published on Hugging Face (fatihburakkaragoz/ottoman-ner-latin)
Built-in Evaluation: Comprehensive evaluation metrics with seqeval
Fast Prediction: Real-time entity recognition
CLI Interface: Command-line tools for all operations
PyPI Ready: Easy installation via pip

Supported Entity Types

PER: Person names (Sultan Abdülhamid, Ahmet Paşa)
LOC: Locations (İstanbul, Rumeli, Anadolu)
ORG: Organizations (Divan-ı Hümayun, Meclis-i Mebusan)
MISC: Miscellaneous entities (dates, events, titles)

Installation

From PyPI (Recommended)

pip install ottoman-ner

From Source

git clone https://github.com/fbkaragoz/ottoman-ner.git
cd ottoman-ner
pip install -e .

# Install with development dependencies
pip install -e .[dev]

# Install with full features (visualization, experiment tracking)
pip install -e .[full]

Tip: Include "labels": ["O", "B-PER", ...] under model in the configuration if you want to control the exact label order used during training.

Quick Start

1. Using Pre-trained Models

from ottoman_ner import OttomanNER

# Initialize the NER system
ner = OttomanNER()

# Load the published pre-trained model (downloads from Hugging Face Hub)
ner.load_model()

# Make predictions
text = "Sultan Abdülhamid İstanbul'da yaşıyordu."
entities = ner.predict(text)

for entity in entities:
    print(f"{entity['text']} -> {entity['label']} ({entity['confidence']:.2f})")

The load_model() call defaults to the official Hugging Face release. Pass a local directory or another Hub repository name to use custom weights.

2. Training Custom Models

from ottoman_ner import OttomanNER

# Initialize
ner = OttomanNER()

# Train from configuration file
results = ner.train_from_config("configs/training.json")
print(f"Training completed! F1 Score: {results['eval_f1']:.4f}")

3. Model Evaluation

from ottoman_ner import OttomanNER

# Initialize and evaluate
ner = OttomanNER()
results = ner.evaluate(
    model_path="fatihburakkaragoz/ottoman-ner-latin",
    test_file="data/test.txt"
)

print(f"F1 Score: {results['overall_f1']:.4f}")
print(f"Precision: {results['overall_precision']:.4f}")
print(f"Recall: {results['overall_recall']:.4f}")

4. Hugging Face Pipeline

from transformers import pipeline

pipe = pipeline(
    task="token-classification",
    model="fatihburakkaragoz/ottoman-ner-latin",
    aggregation_strategy="simple"
)

pipe("Sultan Abdülhamid İstanbul'da yaşıyordu.")

Tip: Include "labels": ["O", "B-PER", ...] under model in the configuration if you want to control the exact label order used during training.

Command Line Interface

Ottoman NER provides a comprehensive CLI for all operations:

Training

# Train a new model
ottoman-ner train --config configs/training.json

# Train with verbose output
ottoman-ner --verbose train --config configs/training.json

Evaluation

# Evaluate a trained model
ottoman-ner eval --model-path fatihburakkaragoz/ottoman-ner-latin --test-file data/test.txt

# Save evaluation results
ottoman-ner eval --model-path fatihburakkaragoz/ottoman-ner-latin --test-file data/test.txt --output-dir results/

Prediction

# Predict on single text
ottoman-ner predict --text "Sultan Abdülhamid İstanbul'da yaşıyordu"

# Predict on file
ottoman-ner predict --input-file input.txt --output-file predictions.json

If --model-path is omitted, the CLI downloads and caches the published Hugging Face model on first use.

Configuration

Create a training configuration file in JSON format:

{
  "experiment": {
    "experiment_name": "my-ottoman-ner"
  },
  "model": {
    "model_name_or_path": "dbmdz/bert-base-turkish-cased",
    "num_labels": 9
  },
  "data": {
    "train_file": "data/train.txt",
    "dev_file": "data/dev.txt",
    "test_file": "data/test.txt",
    "max_length": 512
  },
  "training": {
    "output_dir": "models/my-model",
    "num_train_epochs": 3,
    "per_device_train_batch_size": 4,
    "learning_rate": 2e-5,
    "evaluation_strategy": "steps",
    "eval_steps": 100,
    "save_steps": 100,
    "load_best_model_at_end": true,
    "metric_for_best_model": "eval_f1"
  }
}

Data Format

Ottoman NER expects CoNLL format data with BIO tagging:

Sultan B-PER
Abdülhamid I-PER
İstanbul B-LOC
'da O
yaşıyordu O
. O

Osmanlı B-ORG
Devleti I-ORG
'nin O
başkenti O
İstanbul B-LOC
'dur O
. O

Project Background & Acknowledgments

This project builds upon foundational work in Ottoman Turkish NLP and represents a focused effort to provide a clean, maintainable NER solution for historical Turkish texts.

References

Karagöz et al. (2024) — "Towards a Clean Text Corpus for Ottoman Turkish" ACL Anthology
Özateş et al. (2025) — "Building Foundations for Natural Language Processing of Historical Turkish: Resources and Models" arXiv:2501.04828

Special Thanks

Sincere gratitude to Assoc. Prof. Şaziye Betül Özateş and the Boğaziçi University Computational Linguistics Lab (BUColin) for their foundational contributions to historical Turkish NLP.

Requirements

Python 3.8+
PyTorch 1.9+
Transformers 4.20+
See requirements.txt for complete dependencies

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use Ottoman NER in your research, please cite:

@software{ottoman_ner_2024,
  title={Ottoman NER: A Toolkit for Ottoman Turkish Named Entity Recognition},
  author={Karagöz, Fatih Burak},
  year={2024},
  url={https://github.com/fbkaragoz/ottoman-ner},
  version={2.0.0}
}

Related Projects

For broader Ottoman Turkish NLP research and experimental tools, see the upcoming ottominer repository (coming soon).

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
assets		assets
ottoman_ner		ottoman_ner
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ottoman NER

About

Key Features

Supported Entity Types

Installation

From PyPI (Recommended)

From Source

Quick Start

1. Using Pre-trained Models

2. Training Custom Models

3. Model Evaluation

4. Hugging Face Pipeline

Command Line Interface

Training

Evaluation

Prediction

Configuration

Data Format

Project Background & Acknowledgments

References

Special Thanks

Requirements

Contributing

License

Citation

Related Projects

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

License

fbkaragoz/ottoman-ner

Folders and files

Latest commit

History

Repository files navigation

Ottoman NER

About

Key Features

Supported Entity Types

Installation

From PyPI (Recommended)

From Source

Quick Start

1. Using Pre-trained Models

2. Training Custom Models

3. Model Evaluation

4. Hugging Face Pipeline

Command Line Interface

Training

Evaluation

Prediction

Configuration

Data Format

Project Background & Acknowledgments

References

Special Thanks

Requirements

Contributing

License

Citation

Related Projects

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages