This repository is made for the purpose of sharing the machine learning related papers that I have read during my researching period on this field. Since Machine learning is a tremendously broad topic, I only limited my focus on some branches including: Image Detection and Segmentation, Image Captioning, Image Generation...etc.
- Yolo : A image detection framework that focus on speed. It achieve a real-time frame rate (45fps) on detection task.
- Yolo V2 : An improve version of the original version of Yolo, which make an improvement on both the speed and accuracy.
- Yolo V3 : Yet another improvement of Yolo. A trade-off for improvement of accuracy and speed. The speed is still real-time though.
- Mask-RCNN : maybe detect by putting box and the image is not enough(in my humble opinion :p), putting a segmentation mask on the images is the way to go.
- Faster-RCNN : A object detection which has a state-of-the-art result in accuracy, that can be able to work in real-time(5fps)
- Resnet : Who said deeper network is bad? This architecture is used a really deep network(up to 152 layers) and be able to not be complex. It introduce the skip connection and shortcut to other architecture including the Yolov3 and Faster-RCNN.
- Show, Attend, Tell : Image Captioning with attention module.
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering: the approach to use Fast-Rcnn as the feature extraction then generate caption.
- Image Captioning with Semantic Attention : Similar to the ShowTell, but combine the top-down and bottom-up feature for better image captioning task.
- DenseCap : an end-to-end architecture that dense image captioning task using a propose Fully Convolutional Localization Network.
- Captioning Transformer with Stacked Attention Modules : captioning image with Transformer modules.
- Dense Captioning Events in Videos : As we can see, Dense Captioning for events in Video.... lol.
- Attention is all you need (Transfomer Model) : probably the best research paper in 2017 (again, in my humble opinion), which use stack attention module to replace the normal RNN/LSTM on sequence-to-sequence model.
- Areas of Attention for Image Captioning : compare and propose three methods for image captioning using: activation grid, object proposal, and spatial transformer. The model with spatial tranformer give the best performances.
- Convolutional Sequence to Sequence Learning : Instead of using RNN/LSTM for sequence learning, the author propose using Convolutional network so that it can mitigate the problems that RNN have.
- Generative Adversarial Network: Train two network adversarially to compete each other so that it can later fool human. It has many applications including image generation, super-resolution, style-transfering etc.
- DeLiGAN : Combine GAN with VAE to allow GAN work in small and diverse dataset.
- Unsupervised Action Discovery : Reading.....
- 2017 Trend in Machine Learning by Andrej Karpathy
- Good Inspiration on why you should learn RNN by Andrej Karpathy
- How OpenAI beat pro Dota2 gamer 1vs1 by OpenAI
- How OpenAI beat human 5vs5 in Dota2 game by OpenAI
- Implement Your own Darknet from Scratch with PyTorch by Ayoosh Kathuria
- Transformer Model illustrated by Jay Alammar
- A Good Explanation of Transformer Model by Michał Chromiak
- Implementaion of Mask-RCNN
- Implementation of Darknet by PJReddie
- Implementation of Darknet by AlexyAB
- Python Implementation of Faster-RCNN
- Torch Implementation of DenseCap
- Line-by-Line Attention Transformer implementation
- PyTorch Tutorial
- A collection of papers and project of Image Captioning
- PyTorch Github
- Convolutional Sequence to Sequence
- False Positive: falsely detecting non/background object as object
- False Negative: falsely/fail detecting object even the object exist
- True Positive Rate (TPR): the ratio between all True Positive out of Positives
- False Positive Rate (FPR): the ratio between all False Positive out of Negatives
- Detection Error Tradeoff (DET): Graphical plot for error rate in binary classification between FPR and Miss Detection rate(FNR)
- Receiver Operation Characteristic curve (ROC): Graphical plot of error rate in binary classification between TPR and FPR
- Recall: the ratio of correctly predicted true positive over the total of positive, can be used as True Positive Rate. Recall = TP/P
- Precision: the ratio between true positive over all positive prediction(TP and FP), Precision= TP/(TP+FP)
- F1 score: the mean of all the recall and precision
- Average Precision: scalar way to evaluate the performance of classifier and is the area under the prediction-recall curves
- Jaccard index = Intersection over Union(IoU): ratio between area of overlap over area of union (area of the intersection divided by the area of the union of the two rectangular bounding boxes (ground truth and prediction))
- Cross validation: model validation technique to assess how the results of a statistical analysis will generalize to an independent data set
- Gradient Descent:
- Loss Function:
- Backpropagation:
- Activation Function: