Accelera

Accelera is a hybrid Python/C++ machine learning framework for building graph-based pipelines, running independent branches in parallel, generating HTML reports, and experimenting with automated preprocessing and loop parallelization.

Features

Graph ML pipelines: build DAG-style workflows with preprocessing, model, predict, metric, merge, and branch nodes.
Parallel branch execution: compare multiple preprocessing/model/metric combinations in one pipeline run through the C++ graph backend.
Custom model support: plug in sklearn-compatible estimators or extend CustomClassifier, CustomRegressor, CustomClusterer, and CustomTransformer.
Reporting: generate graph visualizations and HTML metric reports through GraphReport, ModelReport, and AutoML preprocessing reports.
Auto preprocessing: tabular, text, image-classification, and segmentation preprocessing utilities with saved preprocessors and visual summaries.
Dataset retriever: list and download shared CSV datasets into a local cache with accelera.src.utils.dataset_retriever.DatasetRetriever.
C/C++ code parallelizer: extract loops with Clang AST, derive loop features, call classifier/generator services, and inject OpenMP pragmas into parallelizable for loops. This module is Linux-only.
Benchmark backend prototype: Express/MongoDB backend scaffolding for benchmarks, users, metrics, and submissions.

Current Status

The core DAG pipeline, custom estimator interfaces, reports, dataset retrieval, and preprocessing utilities are implemented in this repo.
The AutoML search agent API exists, but the default search algorithm is still a placeholder.
The benchmark backend is an early prototype.
The code parallelizer requires Linux, LLVM/Clang, built pybind bindings, and the remote classifier/generator endpoints configured in accelera/src/config.py.

Quick Start

git clone https://github.com/Mohamed-Ashraf273/accelera.git
cd accelera

python3 -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt
pip install psutil requests gdown graphviz

export PYTHONPATH="$PWD"

# Linux only, required before CMake if you want to build code-parallelizer
# bindings and also because the current Linux CMake config expects LLVM.
sudo bash shell/install_llvm.sh 18

cmake -S . -B build
cmake --build build -j"$(nproc)"

Run Examples

# Parallel sklearn-vs-Accelera pipeline comparison
python examples/sklearn_comp.py

# Full branching pipeline demo with a custom PyTorch classifier and reports
python examples/demo.py

# Run tests
pytest accelera

For notebooks, open examples/dataset_retriever_demo.ipynb, examples/code_optimizer_demo.ipynb, examples/autopreprocessing-classification-v3.ipynb, or examples/segmentation-training-gp.ipynb after exporting PYTHONPATH.

Minimal Usage

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler

from accelera.src.accelera_pipe.core.pipeline import Pipeline

X, y = make_classification(
    n_samples=5000,
    n_features=20,
    n_informative=15,
    random_state=42,
)
X_test, y_test = X[:200], y[:200]

pipe = Pipeline()
pipe.branch(
    "preprocessing",
    pipe.preprocess("standard", StandardScaler(), branch=True),
    pipe.preprocess("minmax", MinMaxScaler(), branch=True),
).model(
    "logreg",
    LogisticRegression(max_iter=1000),
).predict(
    "predict",
    test_data=X_test,
).metric(
    "accuracy",
    "accuracy_score",
    y_true=y_test,
)

predictions, executed_graph = pipe(X, y, select_strategy="max")
best_result = executed_graph(X_test, y_test)
print(predictions)
print(best_result)

More Usage Examples

Dataset Retriever

from accelera.src.utils.dataset_retriever import retriever

print(retriever.available_datasets())

retriever.connect()
housing_df = retriever.retrieve_dataset("Housing", df=True)
print(housing_df.head())
retriever.close()

Tabular Auto Preprocessing

from accelera.src.automl.core.classical_training_preprocessing import (
    ClassicalTrainingPreprocessing,
)
from accelera.src.utils.dataset_retriever import retriever

retriever.connect()
df = retriever.retrieve_dataset("Titanic-Dataset", df=True)

preprocessor = ClassicalTrainingPreprocessing(
    df,
    target_col="Survived",
    problem_type="classification",
    folder_path="./titanic_preprocessing_report",
)
X_train, y_train, X_val, y_val = preprocessor.common_preprocessing()

retriever.close()

Text Auto Preprocessing

import pandas as pd

from accelera.src.automl.core.text_training_preprocessing import (
    TextTrainingPreprocessing,
)

reviews_df = pd.DataFrame(
    {
        "review": ["Great product", "Very bad experience", "I like it"],
        "class": [1, 0, 1],
    }
)

text_preprocessor = TextTrainingPreprocessing(
    reviews_df,
    target_col="class",
    text_col="review",
    folder_path="./reviews_report",
)
X_train, y_train, X_val, y_val = text_preprocessor.common_preprocessing()

Image Auto Preprocessing

from accelera.src.automl.core.classification_image_training_preprocessing import (
    ClassificationImageTrainingPreprocessing,
)

image_preprocessor = ClassificationImageTrainingPreprocessing(
    training_folder_images="./PetImages",  # replace with your class folders
    folder_path="./PetImagesReport",
    split_training=True,
    val_size=0.2,
    images_size=(224, 224),
    augment=True,
)
training_loader, validation_loader = image_preprocessor.common_preprocessing()

Pipeline Graph Report

from accelera.src.utils.accelera_utils import serialize
from accelera.src.accelera_pipe.wrappers.graph_report import GraphReport

predictions, executed_graph = pipe(X, y, select_strategy="max")
serialize(pipe, "pipeline.xml")

report = GraphReport("pipeline_report", "pipeline.xml", predictions)
report.execute()

Standalone Model Report

from sklearn.metrics import accuracy_score

from accelera.src.accelera_pipe.wrappers.model_report import ModelReport

accuracy = accuracy_score(y_test, model.predict(X_test))
results = [
    {
        "metric name": "accuracy",
        "result": accuracy,
        "plot_func": None,
        "labels_name": None,
        "headers_name": None,
    }
]

report = ModelReport("model_report", results=results)
report.execute()

C/C++ Loop Parallelization

from accelera.src.utils.parallelizer import parallelizer

parallelizer.parallelize("examples/test_loops.c")
# Writes examples/parallelized_test_loops.c

Project Map

accelera/
├── accelera/
│   ├── api/                 # generated public API modules
│   ├── bindings/            # pybind11 bindings
│   └── src/
│       ├── accelera_pipe/   # DAG pipeline, execution graph
│       ├── automl/          # preprocessing, reports, AutoML agent scaffold
│       ├── benchmark/       # Node.js backend prototype
│       ├── custom/          # estimator base classes
│       ├── utils/           # dataset retriever, parallelizer and code utilities
│       └── wrappers/        # HTML/report helpers
├── src/                     # C++ core, nodes, AST, and utility sources
├── include/                 # C++ headers
├── examples/                # scripts and notebooks
├── docs/                    # MkDocs documentation
├── shell/                   # setup scripts
└── CMakeLists.txt

Useful Commands

# Regenerate API exports after changing Python modules
python api_gen.py

# Run formatting/lint hooks
pre-commit run --all-files --hook-stage manual

# Serve docs locally
mkdocs serve

License

Apache License 2.0. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accelera

Features

Current Status

Quick Start

Run Examples

Minimal Usage

More Usage Examples

Dataset Retriever

Tabular Auto Preprocessing

Text Auto Preprocessing

Image Auto Preprocessing

Pipeline Graph Report

Standalone Model Report

C/C++ Loop Parallelization

Project Map

Useful Commands

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
.github		.github
accelera		accelera
data		data
docs		docs
examples		examples
include		include
shell		shell
src		src
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
api_gen.py		api_gen.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Accelera

Features

Current Status

Quick Start

Run Examples

Minimal Usage

More Usage Examples

Dataset Retriever

Tabular Auto Preprocessing

Text Auto Preprocessing

Image Auto Preprocessing

Pipeline Graph Report

Standalone Model Report

C/C++ Loop Parallelization

Project Map

Useful Commands

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages