Classifying Business Processes by Level of Structuredness: A Relationship-Based Approach

This repository contains the source code for running and evaluating a structuredness classification algorithm on business process event logs.

Abstract

The choice of a suitable modeling notation in process discovery depends on the structuredness of the process. Structured processes are best captured in imperative notations, loosely structured and knowledge-intensive processes benefit from declarative notations, and semi-structured processes often require hybrid approaches. However, current process mining practice provides little systematic guidance for assessing structuredness before discovery.

This repository implements an automated classification algorithm that determines the structuredness of a business process from activity relationship data. The method analyzes temporal and existential dependencies between activities to detect characteristic workflow patterns and compute aggregated relationship metrics. These components are combined into a continuous structuredness score, which is compared against thresholds to assign the process to one of four classes: structured, semiStructured, looselyStructured, or unstructured.

The tool enables process analysts to pre-classify processes before modeling, supporting the selection of an appropriate notation and improving the practical value of discovered models.

Overview

The Process Structuredness Classifier is a Python-based command-line tool that:

Loads temporal and existential activity relationships from preprocessed data files.
Detects control-flow blocks (e.g., XOR, PAR) and aggregates related blocks into larger structured process fragments ("super-blocks").
Scores the process based on a set of reference metrics.
Maps the score to a structuredness class using configurable thresholds.

Features

Four-class classification: unstructured, looselyStructured, semiStructured, structured
Modular architecture: block detection, block aggregation, scoring.
Verbose mode for detailed inspection of detected structures.
Configurable metrics and thresholds (constants.py).
Tabular summary output for multiple process logs.

Requirements

Python 3.8 or higher
Required Python packages (see requirements.txt):

Installation

Clone the repository:

git clone [email protected]:INSM-TUM/process-classification-activity-relationships.git
cd process-classification-activity-relationships

Install dependencies:
```
pip install -r requirements.txt
```

Usage

Command-Line Interface

Run the classification:

python classify_process.py --dir <path_to_data> [--verbose]

Arguments

--dir (string, optional): Directory containing the input files (default: data_evaluation/data).
--verbose (flag, optional): Enables detailed output of detected blocks, super-blocks, and scoring.

Example

python classify.py --dir data_evaluation/data --verbose

Input Files

The classifier expects input files in JSON format containing precomputed pairwise activity relationships (temporal + existential) for a process.
Files must follow the naming pattern:

<log_name>_<true_class>.json
Example:
Order_Processing_structured.json

How to generate the JSON input files

Start from an Event Log in XES format
You need an event log that represents the process you want to classify. This should be in the standard .xes format.
Generate the Activity Relationship Matrix (YAML)
Use the activity-relationship-matrix-discovery tool to transform the event log into a YAML file containing the activity relationship matrix.
This tool computes temporal and existential dependencies between each pair of activities.
Export the result as a .yaml file.
Convert the YAML to JSON
In this repository, there is a helper script at:
helper/matrix_yaml_to_json.py
Running this script will read the .yaml file from step 2 and convert it into the correct JSON format required by the classifier.
Place the JSON file in your chosen input directory
Name it according to the expected pattern (<log_name>_<true_class>.json) and store it in the folder you will pass to the --dir argument when running the classifier.

Output

Console table summarizing results across all input files, for example:

Log	#SBs	Insiders	Outsiders	Base-Score	SB vs. SB	Out vs. SB	Out vs. Out	Refinement	Score	Class Real	Class Calculated	Match
Order_Processing	2	A,B,C,D	E,F	0.66	0.15	0.10	0.10	0.25	0.81	structured	structured	✅

In verbose mode, detected blocks and super-blocks as well as details about all refinements and metrics are printed for each file.

Project Structure

Code Files

classify_process.py: Main script to run the classification. Handles command-line arguments, calls the classification pipeline, and prints the results table.
block_detection.py: Implements the detection of control-flow blocks (e.g., XOR, PAR) and the combination of these into super-blocks.
score_process.py: Computes process metrics based on detected structures and calculates the final structuredness score.
utils.py: Contains data loading functions and helper utilities for working with activity relationships.
constants.py: Defines configurable thresholds and other constants used throughout the project.
helper/matrix_yaml_to_json.py: Utility script to convert YAML-formatted activity relationship matrices into the JSON format required by the classifier.
helper/count_trace_variants.py: Script to load an event log in XES format, identify unique trace variants, and print their counts.
helper/verify_block_detection.py: Test utility that compares detected control-flow blocks and super-blocks for the development data against expected outputs, useful for verifying correctness after logic changes.

Example Data

data_development/: Dataset used during the algorithm’s iterative development phase. Contains:
- JSON files in the correct input format for the classifier.
- Event logs in .xes format.
- BPMN process models (where applicable).
- Images of the process models for reference.
data_evaluation/: Independent dataset used for evaluation and testing of the final algorithm. Contains:
- JSON files in the correct input format for the classifier.
- Event logs in .xes format.
- BPMN process models (where applicable).
- Images of the process models for reference.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or issues, please contact:

Kerstin Andree ([email protected])
Michael Lachner ([email protected])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Classifying Business Processes by Level of Structuredness: A Relationship-Based Approach

Abstract

Overview

Features

Requirements

Installation

Usage

Command-Line Interface

Arguments

Example

Input Files

How to generate the JSON input files

Output

Project Structure

Code Files

Example Data

License

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data_development		data_development
data_evaluation		data_evaluation
helper		helper
LICENSE		LICENSE
README.md		README.md
block_detection.py		block_detection.py
classify_process.py		classify_process.py
constants.py		constants.py
requirements.txt		requirements.txt
score_process.py		score_process.py
utils.py		utils.py

License

lachnerm/Classifying_Business_Processes

Folders and files

Latest commit

History

Repository files navigation

Classifying Business Processes by Level of Structuredness: A Relationship-Based Approach

Abstract

Overview

Features

Requirements

Installation

Usage

Command-Line Interface

Arguments

Example

Input Files

How to generate the JSON input files

Output

Project Structure

Code Files

Example Data

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages