Leverage deep learning to create powerful image processing apps with TensorFlow 2.0 and Keras.
*Also available as e-book for Kindle readers and for others (PDF, ePUB, Mobi).
Computer vision solutions are becoming increasingly common, making their way in fields such as health, automobile, social media, and robotics. With the release of TensorFlow 2, the brand new version of Google's open source framework for machine learning, it is the perfect time to jump on board and start leveraging deep learning for your visual applications!
This book is a practical guide to building high performance systems for object detection, segmentation, video processing, smartphone applications, and more. By its end, you will have both the theoretical understanding and practical skills to solve advanced computer vision problems with TensorFlow 2.0.
The book is composed of nine chapters to get you started on computer vision and TensorFlow in no time!
Computer Vision and Neural Networks: This chapter provides some theoretical background on computer vision and deep learning. You will learn to implement a neural network from scratch.
TensorFlow Basics and Training a Model: This chapter goes through TensorFlow 2 and Keras concepts related to computer vision, as well as some more advanced notions to create your own solutions.
Modern Neural Networks: CNNs are introduced, detailing how they have revolutionized computer vision. This chapter also presents regularization tools and modern optimization algorithms.
Influential Classification Tools: You will discover state-of-the-art architectures (Inception, ResNet, ...) and transfer learning techniques to efficiently classify images based on their content.
Object Detection Models: This chapter covers the architecture of 2 methods to detect specific objects in images—YOLO, known for its speed, and Faster R-CNN, known for its accuracy.
Enhancing and Segmenting Images: You will discover auto-encoders and how networks such as U-Net and FCN can be applied to image denoising, semantic segmentation, and more.
Training on Complex and Scarce Datasets: TensorFlow tools for optimized data pipelines are presented, as well as solutions for data scarcity (image rendering, domain adaptation, VAEs and GANs).
Video and Recurrent Neural Networks: This chapter covers recurrent neural networks, presenting the LSTM architecture. It provides practical code to apply neural models to action recognition in videos.
Optimizing Models and Deploying on Mobile Devices: Solutions are detailed to optimize models for devices with limited capabilities, followed by instructions for deployment on mobile devices and in the browser.
Each chapter comes with detailed code samples, which are freely available on the GitHub repository dedicated to the book. We assume our readers only have basic skills in Python programming and image processing, but we also share advanced concepts for those curious to dig further.
Most of the code is provided as Jupyter notebooks, which walk you through concrete applications, step-by-step. You will develop deep learning solutions to classify images and videos based on their content to detect vehicles and pedestrians for self-driving systems, to recognize facial expressions, to denoise pictures or to generate new ones, and more!
In the final chapter, you will also learn how to deploy your computer vision solutions on smartphones and on the web, once again guided at each step.
Benjamin Planche is a passionate Ph.D. student at the University of Passau and Siemens Corporate Technology. He has been working for more than five years in the fields of computer vision and deep learning, in various research labs around the world (LIRIS in France, Mitsubishi Electric in Japan, Siemens in Germany). Benjamin has a double Master's degree with first-class honors from INSA-Lyon in France and the University of Passau in Germany.
His research efforts are focused on developing smarter visual systems with less data, targeting industrial applications. Benjamin is also sharing his knowledge and experience on online platforms such as StackOverflow or applying them to the creation of aesthetic demos.
Eliot Andres is a freelance deep learning and computer vision engineer. He has more than 3 years of experience in the field, applied to various industries such as banking, health, social media, and video streaming. Eliot has a double Master's degree from École des Ponts and Télécom Paris.
His focus is industrialization: delivering value by applying new technologies to business problems. Eliot keeps his knowledge up to date by publishing articles on his blog and by building prototypes using the latest technologies.