Image source: https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote01_MLsetup.html
Each machine learning algorithm is a tool. ML-Toolbox is a collection of a few machine learning tools. The goal of this project is to understand machine learning algorithms by learning the theory behind them. This theory will help to choose the right tool for the task at hand.
ML-Toolbox/
┣ 📂assets/
┃ ┣ 📂data/ # datasets
┃ ┣ 📂img/
┃ ┣ 📂scripts/ # preprocessing scripts
┣ 📂Concept Learning/
┃ ┣ 📄titanic_survival_prediction.ipynb
┣ 📂Perceptron/
┃ ┣ 📄gender_prediction.ipynb
┣ 📂Apriori Algorithm/
┃ ┣ 📄correlated_courses.ipynb
┣ 📄README.md
// WIP
┣ 📂SVMs/
┃ ┣ 📄gender_prediction.ipynb
// TO DO
┣ 📂KNNs/
┣ 📂Naive Bayes/
┣ 📂Logistic Regression/
┣ 📂Linear Regression/
┣ 📂Gaussian Processes
┣ 📂Support Vector Machine/
┣ 📂Kernels/
┃ ┣ 📂Perceptron/
┃ ┣ 📂Linear Regression/
┃ ┣ 📂Support Vector Machine/
┣ 📂Decision Trees/
┣ 📂Random Forests/
┣ 📂Bagging/
┣ 📂Boosting/
┣ 📂preprocessing/Bias Variance Decomposition/
┣ 📂Neural Networks/
┣ 📂CNNs/
┣ 📂RNNs/
┣ 📂K Means Clustering/
┣ 📂GMMs/
┣ 📂Kernel Density Estimation/
┣ 📂PCA/
┣ 📂Autoencoders/
┣ 📂VAEs/
Formally, the primary goal of machine learning is to discover the underlying (but unknown) joint distribution P(x, y), which captures the relationship between inputs (x) and outputs (y) in the real world. If we can have access to this distribution, we can stick in any input x and get the y that has the maximum probability for that input.
One way to think about machine learning is as a way to combine knowledge and data to solve problems. ML problems can be visualized within the central region of this Pareto chart.
Image source: https://gpss.cc/gpss24/slides/Ek2024.pdf
If we had complete knowledge, we can express a problem as a formula or algorithm, like the relationship speed = distance / time. If we had access to complete data, the solution is simply be a lookup, like finding a place on a map. However, machine learning comes into play when we have limited knowledge and limited data. The position of the problem on this chart helps us select the appropriate tools. For example, for problems where we have a lot of data, but very little knowledge, we tend to use neural networks.
Image source: https://gpss.cc/gpss24/slides/Ek2024.pdf
Given a problem, our goal is to find the optimal solution h*. In practice, we cannot search over all possible solutions, so we restrict ourselves to a specific class of solutions. This is the first point where we inject knowledge into the problem. For image-related tasks, for example, we might choose CNN-based architectures such as ResNet or U-Net, based on our prior experience that they work well for images. Once this solution class is fixed, we introduce an approximation error. This is the gap between h* and the best possible solution from our choosen class hopt. This error arises from the limitations of the chosen hypothesis space. In our case, maybe using InceptionNet would have given the best results, but we limited ourselves to ResNet like architectures.
After choosing the hypothesis class, we preprocess the data, which is the second point where knowledge is injected. hopt is the solution we would obtain with ideal data, but with limited or biased data, we obtain ĥopt. For example, if all training images have bright backgrounds, the model may perform poorly on images with darker backgrounds. The difference between hopt and ĥopt is the estimation error, caused by limited data. This error can be reduced by using domain knowledge, such as applying data augmentation to improve robustness to lighting variations.
Finally, we choose hyperparameters such as the learning rate, batch size, and optimizer. ĥopt corresponds to the best solution achievable with ideal hyperparameter choices, but in practice, we obtain ĥ because we cannot search the entire hyperparameter space. The gap between ĥopt and ĥ is the optimization error, which can be reduced by injecting knowledge, for example, by selecting an appropriate optimizer.
Overall, a model can suffer from three types of error, and each of these errors can be reduced by injecting knowledge.
- Cornell CS4780 Machine Learning for Intelligent Systems by Prof. Kilian Weinberger.
- CS7.403 Statistical Methods in Artificial Intelligence course by IIIT Hyderabad.
- MIT 6.036 Machine Learning by Prof. Tamara Broderick.
- Gaussian Process Summer School 2024.
- Bias Variance Tradeoff by MIT OpenCourseware and The Stanford NLP Group.
- Kernel Methods in Computer Vision by Prof. Christoph Lampert, and Notes on lagrangian multiplier and KKT.
- Neural Networks and Deep Learning Online Book by Michael Nielsen.
- Talk on Association Rule Mining by Prof. Ami Gates.
