Convolutional Neural Networks for Image Recognition

Jun 7, 2020

The course slides:

1.Background

How can we feed images to a neural network?

A digital image is a 2D grid of pixels. A neural network expects a vector of numbers as input. Locality: nearby pixels are more strongly correlated. Translation invariance: meaningful patterns can occur anywhere in the image.

Taking advantage of topological structure

Weight sharing: use the same network parameters to detect local patterns at many locations in the image Hierarchy: local low-level features are composed into larger, more abstract features

The ImageNet Challenge

Major computer vision benchmark
1.4M images, 1000 classes
Image classification

Building blocks

From fully connected to locally connected

Implementation: the convolution operation

The kernel slides acrpss the image and produces an outpt value at each position. We convolve multiple kernels and obtain multiple feature maps or channels

We convolve multiple kernels and obtain multiple feature maps or channels.

Inputs and outputs are tensors

3D objects that have width, height, channels Each output channel of the convolution is connected to all the input channels as well.

Variants of the convolution operation

Valid convolution: output size = input size - kernel size + 1 The output will be slightly smaller than the input. Full convolution (with added padding): output size = input size + kernel size - 1 The output will be slightly greater than the input. same convolution: output size = input size This ensures feature map size will have the same size as the image. Only makes sense if kernel has odd size strided convolution: kernel slides along the image with a step > 1 Pro: A lot cheaper to compute! Dilated convolution: kernel is spread out, step>1 between kernel elements Pro:

can be computed more efficiently
no need to pad zeros
Pooling

Def:compute mean of max over small windows to reduce resolution.

Convolutional Neural Networks for Image Recognition

1.Background

How can we feed images to a neural network?

Taking advantage of topological structure

The ImageNet Challenge

Building blocks

From fully connected to locally connected

Implementation: the convolution operation

Inputs and outputs are tensors

Variants of the convolution operation

Pooling

3.Convolutional neural networks

Stacking the building blocks

4.Going deeper: case studies

LeNet-5

AlexNet

Deeper is better

VGGNet

VGGNet: stacking 3 3 kernels

VGGNet: error plateaus after 16 layers

Challenges of depth

Improving optimization

GoogLeNet

Batch normalization

ResNet: residual connections

DenseNet: connect layers to all previous layers

Squeeze-and-excitation networks

AmoebaNet: neural architecture search

Reducing complexity

5.Advanced topics

Data augmentation

Other topics to explore

Beyond image recogniiton

What else can we do with Convnets?

Generative models of images

More convnets