GitHub - llangnickel/GermanClinicalTM

Information extraction from German clinical care documents in the context of Alzheimer's Disease

This repository contains the code for several distinct clinical text mining pipelines that were set up in the University Hospital Bonn, aiming to retrieve relevant information from German clinical care documents in the context of Alzheimer's Disease. Whereas a number of data items are stored in machine readable formats such as structured entries in hospital information systems (HIS) or in additional Excel tables, additional valuable information is stored in text documents where information extraction is necessary. For this data, we set up modular rule-based text mining workflows requiring minimal sets of training data. These modules can be easily reused and adapted to further memory clinics settings. All pipelines are based on UIMA Ruta.
Due to data privacy, we can unfortunately not publish our data. However, we provide some synthetic data, written by a physician as well as our source code.

Pipeline overview

How to use

Clone the repository to your local folder: git clone https://github.com/llangnickel/GermanClinicalTM.git
and then change the directory: cd GermanClinicalTM

As a first step, all documents need to be pre-processed, using a sentence detector, a tokenizer and a lemmatizer. For both the sentence detector and the tokenizer, regular expressions were used, which can be found here. The lemmatizer is from the Mate tools.

We provide some medical reports, that can be run via the Section pipeline.
To preprocess the documents, run:
./preprocessing.sh example_data/sections/ out_preprocessing
To annotate the different paragraphs, run:
./ruta.sh out_preprocessing out_sections binary/RutaSections.jar RutaRules/sections/sections.ruta

To annotate deficits such as memory, language, attention or planning disturbances from the anamesis section, you can run:
./preprocessing.sh example_data/anamnesis/ out_anamnesis_pre and
./ruta.sh out_anamnesis_pre/ out_anamnesis binary/RutaDeficits.jar RutaRules/deficits/deficits.ruta

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
RutaRules		RutaRules
binary		binary
config		config
example_data		example_data
img		img
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
preprocessing.sh		preprocessing.sh
ruta.sh		ruta.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information extraction from German clinical care documents in the context of Alzheimer's Disease

Pipeline overview

How to use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Information extraction from German clinical care documents in the context of Alzheimer's Disease

Pipeline overview

How to use

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages