Skip to content

PritK99/ML-Toolbox

Repository files navigation

ML-Toolbox

ML
ML
Image source: https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote01_MLsetup.html

Table of Contents

About

Each machine learning algorithm is a tool. ML-Toolbox is a collection of a few machine learning tools. The goal of this project is to understand machine learning algorithms by learning the theory behind them. This theory will help to choose the right tool for the task at hand.

File Structure

ML-Toolbox/
 ┣ 📂assets/
 ┃ ┣ 📂data/                                 # datasets
 ┃ ┣ 📂img/                                  
 ┃ ┣ 📂scripts/                              # preprocessing scripts
 ┣ 📂Concept Learning/  
 ┃ ┣ 📄titanic_survival_prediction.ipynb  
 ┣ 📂Perceptron/
 ┃ ┣ 📄gender_prediction.ipynb  
 ┣ 📂Apriori Algorithm/
 ┃ ┣ 📄correlated_courses.ipynb 
 ┣ 📄README.md 

// WIP

 ┣ 📂SVMs/
 ┃ ┣ 📄gender_prediction.ipynb  

// TO DO

 ┣ 📂KNNs/
 ┣ 📂Naive Bayes/
 ┣ 📂Logistic Regression/
 ┣ 📂Linear Regression/
 ┣ 📂Gaussian Processes
 ┣ 📂Support Vector Machine/
 ┣ 📂Kernels/                                
 ┃ ┣ 📂Perceptron/
 ┃ ┣ 📂Linear Regression/
 ┃ ┣ 📂Support Vector Machine/
 ┣ 📂Decision Trees/
 ┣ 📂Random Forests/
 ┣ 📂Bagging/
 ┣ 📂Boosting/
 ┣ 📂preprocessing/Bias Variance Decomposition/
 ┣ 📂Neural Networks/
 ┣ 📂CNNs/
 ┣ 📂RNNs/
 ┣ 📂K Means Clustering/  
 ┣ 📂GMMs/  
 ┣ 📂Kernel Density Estimation/   
 ┣ 📂PCA/   
 ┣ 📂Autoencoders/
 ┣ 📂VAEs/
                                                  

ML Philosophy

Definition

Formally, the primary goal of machine learning is to discover the underlying (but unknown) joint distribution P(x, y), which captures the relationship between inputs (x) and outputs (y) in the real world. If we can have access to this distribution, we can stick in any input x and get the y that has the maximum probability for that input.

ml-idea

ML as a combination of Knowledge and Data

One way to think about machine learning is as a way to combine knowledge and data to solve problems. ML problems can be visualized within the central region of this Pareto chart.

./assets/img/ml_pareto_chart.png
Image source: https://gpss.cc/gpss24/slides/Ek2024.pdf

If we had complete knowledge, we can express a problem as a formula or algorithm, like the relationship speed = distance / time. If we had access to complete data, the solution is simply be a lookup, like finding a place on a map. However, machine learning comes into play when we have limited knowledge and limited data. The position of the problem on this chart helps us select the appropriate tools. For example, for problems where we have a lot of data, but very little knowledge, we tend to use neural networks.

How to combine Knowledge and Data?

./assets/img/error_decomposition.png
Image source: https://gpss.cc/gpss24/slides/Ek2024.pdf

Given a problem, our goal is to find the optimal solution h*. In practice, we cannot search over all possible solutions, so we restrict ourselves to a specific class of solutions. This is the first point where we inject knowledge into the problem. For image-related tasks, for example, we might choose CNN-based architectures such as ResNet or U-Net, based on our prior experience that they work well for images. Once this solution class is fixed, we introduce an approximation error. This is the gap between h* and the best possible solution from our choosen class hopt. This error arises from the limitations of the chosen hypothesis space. In our case, maybe using InceptionNet would have given the best results, but we limited ourselves to ResNet like architectures.

After choosing the hypothesis class, we preprocess the data, which is the second point where knowledge is injected. hopt is the solution we would obtain with ideal data, but with limited or biased data, we obtain opt. For example, if all training images have bright backgrounds, the model may perform poorly on images with darker backgrounds. The difference between hopt and opt is the estimation error, caused by limited data. This error can be reduced by using domain knowledge, such as applying data augmentation to improve robustness to lighting variations.

Finally, we choose hyperparameters such as the learning rate, batch size, and optimizer. opt corresponds to the best solution achievable with ideal hyperparameter choices, but in practice, we obtain because we cannot search the entire hyperparameter space. The gap between opt and is the optimization error, which can be reduced by injecting knowledge, for example, by selecting an appropriate optimizer.

Overall, a model can suffer from three types of error, and each of these errors can be reduced by injecting knowledge.

References

About

Ancient Secrets of Machine Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages