Skip to content

Курс лекций "Искусственный интеллект (в компьютерном зрении)"

License

Notifications You must be signed in to change notification settings

anakham/MIET.AI.Course

Repository files navigation

MIET.AI.Course

План курса лекций «компьютерное зрение»

Лекция 0. Python

  1. Вводная беседа
  2. Основы python

Источник: https://docs.python.org/3/tutorial/

Лекция 1. Анализ табличных данных

  1. skitit-learn
  2. xgbboost
  3. Сравнение линейной регрессии и xgbboost на конкретном примере обработки данных

Источники:

https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions

https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/

https://habr.com/en/company/ods/blog/327250/

https://dl.acm.org/doi/pdf/10.1145/2939672.2939785?download=true

Лекция 2. Свёрточные нейронные сети и классификация изображений

  1. Вводная часть про обучение нейронных сетей, какие проблемы приходится решать
  2. MNIST и LeNet
  3. Задача ImageNet

Источники:

https://arxiv.org/pdf/1609.04747.pdf

https://www.eecis.udel.edu/~shatkay/Course/papers/NetworksAndCNNClasifiersIntroVapnik95.pdf

https://arxiv.org/pdf/1502.03167.pdf

http://www.vlfeat.org/matconvnet/matconvnet-manual.pdf

http://www.image-net.org

Николенко и др., Глубокое обучение

Goodfellow

Лекция 3. Нейросетевые детекторы положения объектов на изображении

  1. Region proposals via selective search R-CNN
  2. Fast R-CNN
  3. Faster R-CNN
  4. YOLO, SSD

https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e

http://openaccess.thecvf.com/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf

http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks

http://openaccess.thecvf.com/content_cvpr_2017/html/Redmon_YOLO9000_Better_Faster_CVPR_2017_paper.html

https://arxiv.org/pdf/1512.02325.pdf

Лекция 4. Нейросетевые методы поиска особых точек OpenPose

  1. Shotton, Jamie, Ross Girshick, Andrew Fitzgibbon, Toby Sharp, Mat Cook, Mark Finocchio, Richard Moore et al. "Efficient human pose estimation from single depth images." IEEE transactions on pattern analysis and machine intelligence 35, no. 12 (2012): 2821-2840.
  2. Tompson, Jonathan, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. "Efficient object localization using convolutional networks." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648-656. 2015.
  3. Ramakrishna, Varun, Daniel Munoz, Martial Hebert, James Andrew Bagnell, and Yaser Sheikh. "Pose machines: Articulated pose estimation via inference machines." In European Conference on Computer Vision, pp. 33-47. Springer, Cham, 2014.
  4. Cao, Zhe, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. "OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields." arXiv preprint arXiv:1812.08008 (2018).
  5. Sun, Ke, Bin Xiao, Dong Liu, and Jingdong Wang. "Deep high-resolution representation learning for human pose estimation." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693-5703. 2019.

Лекция 5. GANs

  1. Gui, Jie, Zhenan Sun, Yonggang Wen, Dacheng Tao, and Jieping Ye. "A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications." arXiv preprint arXiv:2001.06937 (2020).

  2. Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013).

  3. Pu, Yunchen, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, and Lawrence Carin. "Variational autoencoder for deep learning of images, labels and captions." In Advances in neural information processing systems, pp. 2352-2360. 2016.

  4. Makhzani, Alireza, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. "Adversarial autoencoders." arXiv preprint arXiv:1511.05644 (2015).

  5. Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing systems, pp. 2672-2680. 2014.

  6. Chen, Xi, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. "Infogan: Interpretable representation learning by information maximizing generative adversarial nets." In Advances in neural information processing systems, pp. 2172-2180. 2016.

  7. Reed, Scott E., Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, and Honglak Lee. "Learning what and where to draw." In Advances in neural information processing systems, pp. 217-225. 2016.

  8. Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-image translation with conditional adversarial networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134. 2017.

  9. Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401-4410. 2019.

Лекция 6. Способы подготовки данных для обучения нейронных сетей

  1. Доверительный интервал для оценки достоверности классификации
  2. Оценки объёмов тестирующих выборок
  3. Источники данных
  4. Платформы mturk, toloka
  5. Симуляционные данные
  6. Трюки при обучении (pseudo-labeling, аугментация)

Источники:

https://ru.coursera.org/lecture/stats-for-data-analysis/dovieritiel-nyie-intiervaly-s-pomoshch-iu-kvantiliei-yboDc

https://sebastianraschka.com/blog/2018/model-evaluation-selection-part4.html

https://www.mturk.com

https://toloka.yandex.ru/tasks

https://github.com/immersive-limit/Unity-ComputerVisionSim

Лекция 7. Методы ускорения нейросетевых вычислений

  1. Пример кода с использованием SIMD-инструкций
  2. Библиотека Openvino
  3. Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).
  4. Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. "Mobilenetv2: Inverted residuals and linear bottlenecks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510-4520. 2018.
  5. Courbariaux, Matthieu, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. "Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1." arXiv preprint arXiv:1602.02830(2016).
  6. Rastegari, Mohammad, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. "Xnor-net: Imagenet classification using binary convolutional neural networks." In European conference on computer vision, pp. 525-542. Springer, Cham, 2016.
  7. BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

Лекция 8. Классические методы компьютерного зрения: вычитание фона

  1. Collins, Robert T., Alan J. Lipton, Takeo Kanade, Hironobu Fujiyoshi, David Duggins, Yanghai Tsin, David Tolliver et al. "A system for video surveillance and monitoring." VSAM final report 2000 (2000): 1-68.
  2. Stauffer, Chris, and W. Eric L. Grimson. "Adaptive background mixture models for real-time tracking." In Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), vol. 2, pp. 246-252. IEEE, 1999.
  3. Goyette, Nil, Pierre-Marc Jodoin, Fatih Porikli, Janusz Konrad, and Prakash Ishwar. "Changedetection. net: A new change detection benchmark dataset." In 2012 IEEE computer society conference on computer vision and pattern recognition workshops, pp. 1-8. IEEE, 2012.
  4. Van Droogenbroeck, Marc, and Olivier Paquot. "Background subtraction: Experiments and improvements for ViBe." In 2012 IEEE computer society conference on computer vision and pattern recognition workshops, pp. 32-37. IEEE, 2012.
  5. Hofmann, Martin, Philipp Tiefenbacher, and Gerhard Rigoll. "Background segmentation with feedback: The pixel-based adaptive segmenter." In 2012 IEEE computer society conference on computer vision and pattern recognition workshops, pp. 38-43. IEEE, 2012.
  6. Wang, Rui, Filiz Bunyak, Guna Seetharaman, and Kannappan Palaniappan. "Static and moving object detection using flux tensor with split gaussian models." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 414-418. 2014.
  7. Lim, Long Ang, and Hacer Yalim Keles. "Learning multi-scale features for foreground segmentation." Pattern Analysis and Applications (2019): 1-12.
  8. Я.Я. Петричкович, А.В. Хамухин. Анализ влияния метода вычитания фона на конечную эффективность систем компьютерного зрения.

Лекция 9. Классические методы компьютерного зрения: вычисление точек особенностей. Усиление метода нейронными сетями

  1. Harris, Christopher G., and Mike Stephens. "A combined corner and edge detector." Alvey vision conference. Vol. 15. No. 50. 1988.
  2. Derpanis, Konstantinos G. "The harris corner detector." York University 2 (2004).
  3. Lowe, David G. "Distinctive image features from scale-invariant keypoints." International journal of computer vision 60.2 (2004): 91-110.
  4. Lindeberg, Tony. "Feature detection with automatic scale selection." International journal of computer vision 30.2 (1998): 79-116.
  5. Rublee, Ethan, et al. "ORB: An efficient alternative to SIFT or SURF." 2011 International conference on computer vision. Ieee, 2011.
  6. Rosten, Edward, and Tom Drummond. "Machine learning for high-speed corner detection." European conference on computer vision. Springer, Berlin, Heidelberg, 2006.
  7. Calonder, Michael, et al. "BRIEF: Computing a local binary descriptor very fast." IEEE transactions on pattern analysis and machine intelligence 34.7 (2011): 1281-1298.
  8. DeTone, Daniel, Tomasz Malisiewicz, and Andrew Rabinovich. "Superpoint: Self-supervised interest point detection and description." Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018.
  9. Barroso-Laguna, Axel, et al. "Key. net: Keypoint detection by handcrafted and learned cnn filters." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.

Лекция 10. Обобщённые дескрипторы изображений, tripletloss.

  1. Sun, Yi, et al. "Deep learning face representation by joint identification-verification." Advances in neural information processing systems
  2. Xiong, Xuehan, and Fernando De la Torre. "Supervised descent method and its applications to face alignment." CVPR 2013.
  3. Taigman, Yaniv, et al. "Deepface: Closing the gap to human-level performance in face verification." CVPR 2014.
  4. Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
  5. Kemelmacher-Shlizerman, Ira, et al. "The megaface benchmark: 1 million faces for recognition at scale." CVPR 2016.
  6. Deng, Jiankang, et al. "Arcface: Additive angular margin loss for deep face recognition." CVPR 2019.

Лекция 11. Реккурентные нейронные сети в компьютерном зрении. GRU, LSTM, visual question answering

  1. Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
  2. Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation.”
  3. Coskun, Huseyin, et al. "Long short-term memory kalman filters: Recurrent neural estimators for pose regularization." Proceedings of the IEEE International Conference on Computer Vision. 2017.
  4. Antol, Stanislaw, et al. "Vqa: Visual question answering." Proceedings of the IEEE international conference on computer vision. 2015.
  5. Nilsson, David, and Cristian Sminchisescu. "Semantic video segmentation by gated recurrent flow propagation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
  6. Wang, Fangjinhua, et al. "Itermvs: Iterative probability estimation for efficient multi-view stereo." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

Лекция 12. Обучение с подкреплением

  1. Ричард С. Саттон, Эндрю Дж. Барто "Обучение с подкреплением. Введение", М., из-во "ДМК-пресс", 2020.
  2. Библиотека Gymnasium.
  3. Библиотека симуляции взаимодействия механизмов и окружающей среды MoJoCoj.
  4. Билиотека симуляции дорожного движения для беспилотных автомобилей CARLA.

Лекция 13. Attention and Transformers

  1. Пример токенизации через TF-IDF
  2. Word2Vec, CBOW, Skip-Gram: Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
  3. GloVe
  4. Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
  5. BLEU (bilingual evaluation understudy
  6. Dosovitskiy, Alexey. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv:2010.11929 (2020).
  7. Yuan, Li, et al. "Tokens-to-token vit: Training vision transformers from scratch on imagenet." IEEE CVPR. 2021.

Лекция 14. Stable Diffusion

  1. Сегментация U-Net: Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation.”
  2. Математические понятия для stable diffusion: Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International conference on machine learning. pmlr, 2015.
  3. Схема обучения: Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in neural information processing systems 33 (2020): 6840-6851.
  4. Esser, Patrick, Robin Rombach, and Bjorn Ommer. "Taming transformers for high-resolution image synthesis." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
  5. Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
  6. Zhang, Lvmin, Anyi Rao, and Maneesh Agrawala. "Adding conditional control to text-to-image diffusion models." Proceedings of the IEEE/CVF international conference on computer vision. 2023.

Лекция 15. AlphaGo и EfficientNet

  1. Алгоритм альфа-бета отсечений
  2. AlphaGo: Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." nature 529.7587 (2016): 484-489.
  3. AlphaZero: Silver, David, et al. "Mastering the game of go without human knowledge." nature 550.7676 (2017): 354-359.
  4. MnasNet: Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." CVPR. 2019.
  5. Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).

About

Курс лекций "Искусственный интеллект (в компьютерном зрении)"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published