This repository is made for the purpose of sharing the machine learning related papers that I have read during my researching period on this field. Since Machine learning is a tremendously broad topic, I only limited my focus on some branches including: Image Detection and Segmentation, Image Captioning, Image Generation...etc.
- Yolo : A image detection framework that focus on speed. It achieve a real-time frame rate (45fps) on detection task.
- Yolo V2 : An improve version of the original version of Yolo, which make an improvement on both the speed and accuracy.
- Yolo V3 : Yet another improvement of Yolo. A trade-off for improvement of accuracy and speed. The speed is still real-time though.
- Mask-RCNN : maybe detect by putting box and the image is not enough(in my humble opinion :p), putting a segmentation mask on the images is the way to go.
- Faster-RCNN : A object detection which has a state-of-the-art result in accuracy, that can be able to work in real-time(5fps)
- Resnet : Who said deeper network is bad? This architecture is used a really deep network(up to 152 layers) and be able to not be complex. It introduce the skip connection and shortcut to other architecture including the Yolov3 and Faster-RCNN.
- Show, Attend, Tell : Image Captioning with attention module.
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering: the approach to use Fast-Rcnn as the feature extraction then generate caption.
- Image Captioning with Semantic Attention : Similar to the ShowTell, but combine the top-down and bottom-up feature for better image captioning task.
- DenseCap : an end-to-end architecture that dense image captioning task using a propose Fully Convolutional Localization Network.
- Captioning Transformer with Stacked Attention Modules : captioning image with Transformer modules.
- Dense Captioning Events in Videos : As we can see, Dense Captioning for events in Video.... lol.
- Attention is all you need (Transfomer Model) : probably the best research paper in 2017 (again, in my humble opinion), which use stack attention module to replace the normal RNN/LSTM on sequence-to-sequence model.
- Areas of Attention for Image Captioning : compare and propose three methods for image captioning using: activation grid, object proposal, and spatial transformer. The model with spatial tranformer give the best performances.
- Convolutional Sequence to Sequence Learning : Instead of using RNN/LSTM for sequence learning, the author propose using Convolutional network so that it can mitigate the problems that RNN have.
- Generative Adversarial Network: Train two network adversarially to compete each other so that it can later fool human. It has many applications including image generation, super-resolution, style-transfering etc.
- DeLiGAN : Combine GAN with VAE to allow GAN work in small and diverse dataset.
- Unsupervised Action Discovery : Reading.....
- Through-Wall Human Pose Estimation Using Radio Signals
- Human-level control through deep reinforcement learning
- 2017 Trend in Machine Learning by Andrej Karpathy
- Good Inspiration on why you should learn RNN by Andrej Karpathy
- How OpenAI beat pro Dota2 gamer 1vs1 by OpenAI
- How OpenAI beat human 5vs5 in Dota2 game by OpenAI
- Implement Your own Darknet from Scratch with PyTorch by Ayoosh Kathuria
- Transformer Model illustrated by Jay Alammar
- A Good Explanation of Transformer Model by Michał Chromiak
- Implementaion of Mask-RCNN
- Implementation of Darknet by PJReddie
- Implementation of Darknet by AlexyAB
- Python Implementation of Faster-RCNN
- Torch Implementation of DenseCap
- Line-by-Line Attention Transformer implementation
- PyTorch Tutorial
- A collection of papers and project of Image Captioning
- PyTorch Github
- Tensorflow Github
- Convolutional Sequence to Sequence
- False Positive: falsely detecting non/background object as object
- False Negative: falsely/fail detecting object even the object exist
- True Positive Rate (TPR): the ratio between all True Positive out of Positives
- False Positive Rate (FPR): the ratio between all False Positive out of Negatives
- Detection Error Tradeoff (DET): Graphical plot for error rate in binary classification between FPR and Miss Detection rate(FNR)
- Receiver Operation Characteristic curve (ROC): Graphical plot of error rate in binary classification between TPR and FPR
- Recall: the ratio of correctly predicted true positive over the total of positive, can be used as True Positive Rate. Recall = TP/P
- Precision: the ratio between true positive over all positive prediction(TP and FP), Precision= TP/(TP+FP)
- F1 score: the mean of all the recall and precision
- Average Precision: scalar way to evaluate the performance of classifier and is the area under the prediction-recall curves
- Jaccard index = Intersection over Union(IoU): ratio between area of overlap over area of union (area of the intersection divided by the area of the union of the two rectangular bounding boxes (ground truth and prediction))
- Cross validation: model validation technique to assess how the results of a statistical analysis will generalize to an independent data set
- Gradient Descent: an optimization algorithm used to minimize loss function descent by the gradient.
- Loss Function:a loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event. An optimization problem seeks to minimize a loss function
- Backpropagation: is a way of computing gradients of expressions through recursive application of chain rule.
- Activation Function: Activation functions are mathematical equations that determine the output of a neural network. The function is attached to each neuron in the network, and determines whether it should be activated or not. Activation functions also help normalize the output of each neuron to a range between 1 and 0 or between -1 and 1.