Releases: AdrianAntico/RetroFit
Small fixes
The regression report had an issue when I added the deciles output to the report for the classification case. That's been fixed.
For the create_model_data function, RetroFit will no longer throw an error if you pass a single string variable name to either NumericColumnNames, CategoricalColumnNames, or TextColumnNames. Additionally, if a user passes a list to TargetColumnName (e.g. ["TargetVar"], there will no downstream errors.
Bug fixes for shap values and polars dataframes
Reports now show shap values. polars has an issue with the unpivot method, mixing up the value_name and variable_name parameters. I excluded those params for the unpivot and updated alias's afterwards.
Also, when running catboost in cpu mode, no changes are needed to the params to run a model.
Insights reports for Regression and Classification
Added
- Unified Model Insights Reports for regression and classification
- Interactive HTML reports with metrics, calibration, PDPs, SHAP, etc.
- Threshold-based classification visuals
v0.2.0 First Fully Usable RetroFit Release (Major ML Engine Rewrite)
Version 0.2.0 is the first truly usable and production-ready version of RetroFit.
Prior to this release, RetroFit existed mostly as an early-stage prototype.
This update brings a complete rewrite of key components in MachineLearning.py, modern packaging, and the first full interpretability features.
Feature Engineering Class
Feature Engineering is now a class-based setup. User can choose between datatable, polars, and pandas for feature engineering operations.
The ML examples on the readme currently reflects usage for the datatable version with expanded feature engineering to highlight their usage.
V0.1.4
RetroFit class:
Added XGBoost and LightGBM. Scoring also allows users to pass in new data for scoring. Examples on README
Added RetroFit class v1
Added the first version of many for the RetroFit class for machine learning
####################################
# Goals
####################################
Class Initialization
Model Initialization
Training
Grid Tuning
Scoring
Model Evaluation
Model Interpretation
####################################
# Functions
####################################
ML1_Single_Train()
ML1_Single_Score()
####################################
# Attributes
####################################
self.ModelArgs = ModelArgs
self.ModelArgsNames = [*self.ModelArgs]
self.Runs = len(self.ModelArgs)
self.DataSets = DataSets
self.DataSetsNames = [*self.DataSets]
self.ModelList = dict()
self.ModelListNames = []
self.FitList = dict()
self.FitListNames = []
self.EvaluationList = dict()
self.EvaluationListNames = []
self.InterpretationList = dict()
self.InterpretationListNames = []
self.CompareModelsList = dict()
self.CompareModelsListNames = []
####################################
# Example Usage
####################################
# Setup Environment
import timeit
import datatable as dt
from datatable import sort, f, by
import retrofit
from retrofit import FeatureEngineering as fe
from retrofit import MachineLearning as ml
# Load some data
# BechmarkData.csv is located is the tests folder
Path = "./BenchmarkData.csv"
data = dt.fread(Path)
# Create partitioned data sets
Data = fe.FE2_AutoDataParition(
data=data,
ArgsList=None,
DateColumnName=None,
PartitionType='random',
Ratios=[0.7,0.2,0.1],
ByVariables=None,
Sort=False,
Processing='datatable',
InputFrame='datatable',
OutputFrame='datatable')
# Prepare modeling data sets
DataSets = ml.ML0_GetModelData(
Processing='Ftrl',
TrainData=Data['TrainData'],
ValidationData=Data['ValidationData'],
TestData=Data['TestData'],
ArgsList=None,
TargetColumnName='Leads',
NumericColumnNames=['XREGS1', 'XREGS2', 'XREGS3'],
CategoricalColumnNames=['MarketingSegments', 'MarketingSegments2', 'MarketingSegments3', 'Label'],
TextColumnNames=None,
WeightColumnName=None,
Threads=-1,
InputFrame='datatable')
# Get args list for algorithm and target type
ModelArgs = ml.ML0_Parameters(
Algorithms='Ftrl',
TargetType="Regression",
TrainMethod="Train")
# Initialize RetroFit
x = RetroFit(ModelArgs, DataSets)
# Train Model
x.ML1_Single_Train(Algorithm='Ftrl')
# Score data
x.ML1_Single_Score(DataName=x.DataSetsNames[2], ModelName=x.ModelListNames[0], Algorithm='Ftrl')
# Scoring data names
x.DataSets.keys()
# Check ModelArgs Dict
x.ModelArgs
# Check the names of data sets collected
x.DataSetsNames
# List of model names
x.ModelListNames
# List of model fitted names
x.FitListNames
# List of comparisons
x.CompareModelsListNames
V0.1.0
Enhanced FE2_AutoDataPartition() for Processing = 'datatable' and 'polars'
Added methods for xgboost and lightgbm for ML0_GetModelData()
Modified sorting and subsetting tasks for Processing = 'polars'
V0.0.9
Added polars processing to FE2_AutoDataPartition(), added examples to README, and fixed some bugs in the other functions
New Functions
Created framework for organizing modules and functions within modules.
New functions include:
FE2_AutoDataParition()
# Example
import datatable as dt
import retrofit
from retrofit import FeatureEngineering as fe
from retrofit import utils as u
# random
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
DataSets = fe.FE2_AutoDataParition(
data=data,
ArgsList=None,
DateColumnName='CalendarDateColumn',
PartitionType='random',
Ratios=[0.70,0.20,0.10],
ByVariables=None,
Processing='datatable',
InputFrame='datatable',
OutputFrame='datatable')
TrainData = DataSets['TrainData']
ValidationData = DataSets['ValidationData']
TestData = DataSets['TestData']
ArgsList = DataSets['ArgsList']
FE1_DummyVariables()
import datatable as dt
import retrofit
from retrofit import FeatureEngineering as fe
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
Output = fe.FE1_DummyVariables(
data=data,
ArgsList=None,
CategoricalColumnNames=['MarketingSegments','MarketingSegments2'],
Processing='datatable',
InputFrame='datatable',
OutputFrame='datatable')
data = Output['data']
ArgsList = Output['ArgsList']
ML0_GetModelData()
# ML0_GetModelData Example:
import datatable as dt
from datatable import sort, f, by
import retrofit
from retrofit import FeatureEngineering as fe
from retrofit import MachineLearning as ml
# Load some data
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
# Create partitioned data sets
DataSets = fe.FE2_AutoDataParition(
data=data,
ArgsList=None,
DateColumnName='CalendarDateColumn',
PartitionType='random',
Ratios=[0.70,0.20,0.10],
ByVariables=None,
Processing='datatable',
InputFrame='datatable',
OutputFrame='datatable')
# Collect partitioned data
TrainData = DataSets['TrainData']
ValidationData = DataSets['ValidationData']
TestData = DataSets['TestData']
del DataSets
# Create catboost data sets
DataSets = ml.ML0_GetModelData(
TrainData=TrainData,
ValidationData=ValidationData,
TestData=TestData,
ArgsList=None,
TargetColumnName='Leads',
NumericColumnNames=['XREGS1', 'XREGS2', 'XREGS3'],
CategoricalColumnNames=['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],
TextColumnNames=None,
WeightColumnName=None,
Threads=-1,
Processing='catboost',
InputFrame='datatable')
# Collect catboost training data
catboost_train = DataSets['train_data']
catboost_validation = DataSets['validation_data']
catboost_test = DataSets['test_data']