Book Overview

Computer vision solutions are becoming increasingly common, making their way in fields such as health, automobile, social media, and robotics. With the release of TensorFlow 2, the brand new version of Google's open source framework for machine learning, it is the perfect time to jump on board and start leveraging deep learning for your visual applications!

This book is a practical guide to building high performance systems for object detection, segmentation, video processing, smartphone applications, and more. By its end, you will have both the theoretical understanding and practical skills to solve advanced computer vision problems with TensorFlow 2.0.

Leverage TensorFlow 2 for computer vision!

  • Create your own neural networks from scratch
  • Classify images with modern architectures including Inception and ResNet
  • Detect and segment objects in images with YOLO, Mask R-CNN, and U-Net
  • Tackle problems in developing self-driving cars and facial emotion recognition systems
  • Boost your application’s performance with transfer learning, GANs, and domain adaptation
  • Use recurrent neural networks for video analysis
  • Optimize and deploy your networks on mobile devices and in the browser
Jupyter Notebooks


The book is composed of nine chapters to get you started on computer vision and TensorFlow in no time!

Chapter 1

Computer Vision and Neural Networks:
This chapter provides some theoretical background on computer vision and deep learning. You will learn to implement a neural network from scratch.

Chapter Two

TensorFlow Basics and Training a Model:
This chapter goes through TensorFlow 2 and Keras concepts related to computer vision, as well as some more advanced notions to create your own solutions.

Chapter Three

Modern Neural Networks:
CNNs are introduced, detailing how they have revolutionized computer vision. This chapter also presents regularization tools and modern optimization algorithms.

Chapter Four

Influential Classification Tools:
You will discover state-of-the-art architectures (Inception, ResNet, ...) and transfer learning techniques to efficiently classify images based on their content.

Chapter Five

Object Detection Models:
This chapter covers the architecture of 2 methods to detect specific objects in images—YOLO, known for its speed, and Faster R-CNN, known for its accuracy.

Chapter Six

Enhancing and Segmenting Images:
You will discover auto-encoders and how networks such as U-Net and FCN can be applied to image denoising, semantic segmentation, and more.

Chapter Seven

Training on Complex and Scarce Datasets:
TensorFlow tools for optimized data pipelines are presented, as well as solutions for data scarcity (image rendering, domain adaptation, VAEs and GANs).

Chapter Eight

Video and Recurrent Neural Networks:
This chapter covers recurrent neural networks, presenting the LSTM architecture. It provides practical code to apply neural models to action recognition in videos.

Chapter Nine

Optimizing Models and Deploying on Mobile Devices: Solutions are detailed to optimize models for devices with limited capabilities, followed by instructions for deployment on mobile devices and in the browser.

Code Samples

Jupyter notebooks and iOS/Android/web apps

Each chapter comes with detailed code samples, which are freely available on the GitHub repository dedicated to the book. We assume our readers only have basic skills in Python programming and image processing, but we also share advanced concepts for those curious to dig further.

Most of the code is provided as Jupyter notebooks, which walk you through concrete applications, step-by-step. You will develop deep learning solutions to classify images and videos based on their content to detect vehicles and pedestrians for self-driving systems, to recognize facial expressions, to denoise pictures or to generate new ones, and more!

In the final chapter, you will also learn how to deploy your computer vision solutions on smartphones and on the web, once again guided at each step.

About The Authors

Benjamin Planche - Portrait

Benjamin Planche

Benjamin Planche is a passionate Ph.D. student at the University of Passau and Siemens Corporate Technology. He has been working for more than five years in the fields of computer vision and deep learning, in various research labs around the world (LIRIS in France, Mitsubishi Electric in Japan, Siemens in Germany). Benjamin has a double Master's degree with first-class honors from INSA-Lyon in France and the University of Passau in Germany.

His research efforts are focused on developing smarter visual systems with less data, targeting industrial applications. Benjamin is also sharing his knowledge and experience on online platforms such as StackOverflow or applying them to the creation of aesthetic demos.

Eliot Andres - Portrait

Eliot Andres

Eliot Andres is a freelance deep learning and computer vision engineer. He has more than 3 years of experience in the field, applied to various industries such as banking, health, social media, and video streaming. Eliot has a double Master's degree from École des Ponts and Télécom Paris.

His focus is industrialization: delivering value by applying new technologies to business problems. Eliot keeps his knowledge up to date by publishing articles on his blog and by building prototypes using the latest technologies.