Background

This repository is made for the purpose of sharing the machine learning related papers that I have read during my researching period on this field. Since Machine learning is a tremendously broad topic, I only limited my focus on some branches including: Image Detection and Segmentation, Image Captioning, Image Generation...etc.

ML Related Papers

Image Detection and Segmentation

Yolo : A image detection framework that focus on speed. It achieve a real-time frame rate (45fps) on detection task.
Yolo V2 : An improve version of the original version of Yolo, which make an improvement on both the speed and accuracy.
Yolo V3 : Yet another improvement of Yolo. A trade-off for improvement of accuracy and speed. The speed is still real-time though.
Mask-RCNN : maybe detect by putting box and the image is not enough(in my humble opinion :p), putting a segmentation mask on the images is the way to go.
Faster-RCNN : A object detection which has a state-of-the-art result in accuracy, that can be able to work in real-time(5fps)
Resnet : Who said deeper network is bad? This architecture is used a really deep network(up to 152 layers) and be able to not be complex. It introduce the skip connection and shortcut to other architecture including the Yolov3 and Faster-RCNN.

Image/Video Captioning

Show, Attend, Tell : Image Captioning with attention module.
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering: the approach to use Fast-Rcnn as the feature extraction then generate caption.
Image Captioning with Semantic Attention : Similar to the ShowTell, but combine the top-down and bottom-up feature for better image captioning task.
DenseCap : an end-to-end architecture that dense image captioning task using a propose Fully Convolutional Localization Network.
Captioning Transformer with Stacked Attention Modules : captioning image with Transformer modules.
Dense Captioning Events in Videos : As we can see, Dense Captioning for events in Video.... lol.

Attention Mechanism/ RNN / LSTM

Attention is all you need (Transfomer Model) : probably the best research paper in 2017 (again, in my humble opinion), which use stack attention module to replace the normal RNN/LSTM on sequence-to-sequence model.
Areas of Attention for Image Captioning : compare and propose three methods for image captioning using: activation grid, object proposal, and spatial transformer. The model with spatial tranformer give the best performances.
Convolutional Sequence to Sequence Learning : Instead of using RNN/LSTM for sequence learning, the author propose using Convolutional network so that it can mitigate the problems that RNN have.

Image Generation

Generative Adversarial Network: Train two network adversarially to compete each other so that it can later fool human. It has many applications including image generation, super-resolution, style-transfering etc.
DeLiGAN : Combine GAN with VAE to allow GAN work in small and diverse dataset.

Other

Unsupervised Action Discovery : Reading.....

Blogs & Videos

2017 Trend in Machine Learning by Andrej Karpathy
Good Inspiration on why you should learn RNN by Andrej Karpathy
How OpenAI beat pro Dota2 gamer 1vs1 by OpenAI
How OpenAI beat human 5vs5 in Dota2 game by OpenAI
Implement Your own Darknet from Scratch with PyTorch by Ayoosh Kathuria
Transformer Model illustrated by Jay Alammar
A Good Explanation of Transformer Model by Michał Chromiak

Git Good Project

Statistical Terms (maybe for people like me C:)

False Positive: falsely detecting non/background object as object
False Negative: falsely/fail detecting object even the object exist
True Positive Rate (TPR): the ratio between all True Positive out of Positives
False Positive Rate (FPR): the ratio between all False Positive out of Negatives
Detection Error Tradeoff (DET): Graphical plot for error rate in binary classification between FPR and Miss Detection rate(FNR)
Receiver Operation Characteristic curve (ROC): Graphical plot of error rate in binary classification between TPR and FPR
Recall: the ratio of correctly predicted true positive over the total of positive, can be used as True Positive Rate. Recall = TP/P
Precision: the ratio between true positive over all positive prediction(TP and FP), Precision= TP/(TP+FP)
F1 score: the mean of all the recall and precision
Average Precision: scalar way to evaluate the performance of classifier and is the area under the prediction-recall curves
Jaccard index = Intersection over Union(IoU): ratio between area of overlap over area of union (area of the intersection divided by the area of the union of the two rectangular bounding boxes (ground truth and prediction))
Cross validation: model validation technique to assess how the results of a statistical analysis will generalize to an independent data set
Gradient Descent:
Loss Function:
Backpropagation:
Activation Function:

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Background

ML Related Papers

Image Detection and Segmentation

Image/Video Captioning

Attention Mechanism/ RNN / LSTM

Image Generation

Other

Blogs & Videos

Git Good Project

Statistical Terms (maybe for people like me C:)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Background

ML Related Papers

Image Detection and Segmentation

Image/Video Captioning

Attention Mechanism/ RNN / LSTM

Image Generation

Other

Blogs & Videos

Git Good Project

Statistical Terms (maybe for people like me C:)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages