Advanced Models for Computer Vision

Jun 8, 2020

The course slides:

Goal of the lecture: Know how to redefine the building blocks to perform different visual tasks using different inputs and different forms of supervision.

1.Supervised image (beyond classification)

Task definitions	Train and eval	Tricks of the trade
Objecy detection	Models and losses	Hard negative mining
Semantic segmentation	Metrics and benchmarks	Transfer learning

Tasks - increasing granularity

classification -> object detection -> semantic segmentation -> instance segmentation

Object detection

A Multitask problem: Classification & Localization

Inputs	Targets
RGB Image HW3	Class label & Object bounding box (for all the objects present in the scene)

Bounding box prediction

How to learn to predict real-valued bounding box coordinates?

Recap: Softmax + cross entropy

Assign data points to categories; output is discrete.

Mistakes are not quantifiable in classification; the idea is not ordered.

In classification, the output is discrete; in regression, the output is continuous.

Quadratic loss for regression

Minimize the MSE over samples.

\[l_2(x,t) = |x-t|^2\]

Property	Classification	Regression
Basic	map inputs to predefined classes	map inputs to continuous values
Output	discrete values	continuous values
Nature of the data	unordered data	ordered data
Algorithms	logistic regression, decision trees, neural networks	linear regression, neural networks

Inputs	Targets
Pairs of RGB Image	Dense flow map;
2D translation displacement

Inputs	Targets
RGB Video T H W 3
(optional) flow map	action label

Advanced Models for Computer Vision

1.Supervised image (beyond classification)

Tasks - increasing granularity

Object detection

Bounding box prediction

Recap: Softmax + cross entropy

Quadratic loss for regression

Classification then regression

Summary

Case study 1: Faster R-CNN

Case study 2:RetinaNet - one-stage detector

Issue with one-stage detectors

Semantic segmentation

Case study: U-NET

Evaluation metrics

Trick of the trade

Transfer learning

Transfer learning across different domains

Sim2Real

2.Supervised (Beyond singe image input) Classification

Experiment

Video

Optical flow estimation

Case study: FlowNet

Video models using 3D convolutions

Properties of 3D convolutions

Transfer learning returns

Challenges in video processing

Improve efficiency of video models

3.(Beyond strong supervised) image classification

Self-supervision - Metric learning

Metric learning

New state-of-the-art in representation learning

4.Open questions