A focused toolkit for Ottoman Turkish Named Entity Recognition
Ottoman NER is a specialized Python package for Named Entity Recognition (NER) in Ottoman Turkish texts. This package provides a clean, modern interface for training, evaluating, and using NER models specifically designed for historical Ottoman Turkish documents.
- Focused NER Solution: Dedicated solely to Ottoman Turkish named entity recognition
- Simple API: Single class interface for all NER operations
- Easy Training: Train custom models with JSON configuration
- Pretrained Weights: Official model published on Hugging Face (
fatihburakkaragoz/ottoman-ner-latin) - Built-in Evaluation: Comprehensive evaluation metrics with seqeval
- Fast Prediction: Real-time entity recognition
- CLI Interface: Command-line tools for all operations
- PyPI Ready: Easy installation via pip
- PER: Person names (Sultan Abdülhamid, Ahmet Paşa)
- LOC: Locations (İstanbul, Rumeli, Anadolu)
- ORG: Organizations (Divan-ı Hümayun, Meclis-i Mebusan)
- MISC: Miscellaneous entities (dates, events, titles)
pip install ottoman-nergit clone https://github.com/fbkaragoz/ottoman-ner.git
cd ottoman-ner
pip install -e .
# Install with development dependencies
pip install -e .[dev]
# Install with full features (visualization, experiment tracking)
pip install -e .[full]Tip: Include
"labels": ["O", "B-PER", ...]undermodelin the configuration if you want to control the exact label order used during training.
from ottoman_ner import OttomanNER
# Initialize the NER system
ner = OttomanNER()
# Load the published pre-trained model (downloads from Hugging Face Hub)
ner.load_model()
# Make predictions
text = "Sultan Abdülhamid İstanbul'da yaşıyordu."
entities = ner.predict(text)
for entity in entities:
print(f"{entity['text']} -> {entity['label']} ({entity['confidence']:.2f})")The load_model() call defaults to the official Hugging Face release. Pass a local directory or another Hub repository name to use custom weights.
from ottoman_ner import OttomanNER
# Initialize
ner = OttomanNER()
# Train from configuration file
results = ner.train_from_config("configs/training.json")
print(f"Training completed! F1 Score: {results['eval_f1']:.4f}")from ottoman_ner import OttomanNER
# Initialize and evaluate
ner = OttomanNER()
results = ner.evaluate(
model_path="fatihburakkaragoz/ottoman-ner-latin",
test_file="data/test.txt"
)
print(f"F1 Score: {results['overall_f1']:.4f}")
print(f"Precision: {results['overall_precision']:.4f}")
print(f"Recall: {results['overall_recall']:.4f}")from transformers import pipeline
pipe = pipeline(
task="token-classification",
model="fatihburakkaragoz/ottoman-ner-latin",
aggregation_strategy="simple"
)
pipe("Sultan Abdülhamid İstanbul'da yaşıyordu.")Tip: Include
"labels": ["O", "B-PER", ...]undermodelin the configuration if you want to control the exact label order used during training.
Ottoman NER provides a comprehensive CLI for all operations:
# Train a new model
ottoman-ner train --config configs/training.json
# Train with verbose output
ottoman-ner --verbose train --config configs/training.json# Evaluate a trained model
ottoman-ner eval --model-path fatihburakkaragoz/ottoman-ner-latin --test-file data/test.txt
# Save evaluation results
ottoman-ner eval --model-path fatihburakkaragoz/ottoman-ner-latin --test-file data/test.txt --output-dir results/# Predict on single text
ottoman-ner predict --text "Sultan Abdülhamid İstanbul'da yaşıyordu"
# Predict on file
ottoman-ner predict --input-file input.txt --output-file predictions.jsonIf --model-path is omitted, the CLI downloads and caches the published Hugging Face model on first use.
Create a training configuration file in JSON format:
{
"experiment": {
"experiment_name": "my-ottoman-ner"
},
"model": {
"model_name_or_path": "dbmdz/bert-base-turkish-cased",
"num_labels": 9
},
"data": {
"train_file": "data/train.txt",
"dev_file": "data/dev.txt",
"test_file": "data/test.txt",
"max_length": 512
},
"training": {
"output_dir": "models/my-model",
"num_train_epochs": 3,
"per_device_train_batch_size": 4,
"learning_rate": 2e-5,
"evaluation_strategy": "steps",
"eval_steps": 100,
"save_steps": 100,
"load_best_model_at_end": true,
"metric_for_best_model": "eval_f1"
}
}Ottoman NER expects CoNLL format data with BIO tagging:
Sultan B-PER
Abdülhamid I-PER
İstanbul B-LOC
'da O
yaşıyordu O
. O
Osmanlı B-ORG
Devleti I-ORG
'nin O
başkenti O
İstanbul B-LOC
'dur O
. O
This project builds upon foundational work in Ottoman Turkish NLP and represents a focused effort to provide a clean, maintainable NER solution for historical Turkish texts.
- Karagöz et al. (2024) — "Towards a Clean Text Corpus for Ottoman Turkish" ACL Anthology
- Özateş et al. (2025) — "Building Foundations for Natural Language Processing of Historical Turkish: Resources and Models" arXiv:2501.04828
Sincere gratitude to Assoc. Prof. Şaziye Betül Özateş and the Boğaziçi University Computational Linguistics Lab (BUColin) for their foundational contributions to historical Turkish NLP.
- Python 3.8+
- PyTorch 1.9+
- Transformers 4.20+
- See
requirements.txtfor complete dependencies
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
If you use Ottoman NER in your research, please cite:
@software{ottoman_ner_2024,
title={Ottoman NER: A Toolkit for Ottoman Turkish Named Entity Recognition},
author={Karagöz, Fatih Burak},
year={2024},
url={https://github.com/fbkaragoz/ottoman-ner},
version={2.0.0}
}For broader Ottoman Turkish NLP research and experimental tools, see the upcoming ottominer repository (coming soon).