Reading: Object Detection for dummies (Lilian Weng)

⚠️

Cái bài báo này viết khá vắn tắt (cảm giác bả viết tóm tắt 1 bài báo nào đó). Do đó trong quá trình đọc mình có tìm những nguồn ngoài giải thích kỹ hơn, có ref trong bài!

☝

Đọc bài báo chính, note này phụ.

Part 1 — Gradient Vector, HOG, and SS

“Object detection” (spot where the object is) ~ “Object recognition” (whether an object exists in an image) ← but they may be called with the same thing.

Image Gradient Vector

Recall

ㅤ	Derivative	Directional Derivative	Gradient
Value type	Scalar	Scalar	Vector
Definition	The rate of change of a function f(x,y,z,…) at a point (x0,y0,z0,…), which is the slope of the tangent line at the point.	The instantaneous rate of change of f(x,y,z,…) in the direction of an unit vector u→.	It points in the direction of the greatest rate of increase of the function, containing all the partial derivative information of a multivariable function.

In the image processing, we want to know the direction of colors changing from one extreme to the other (i.e. black to white on a grayscale image). ← measure gradient

However, repeating the gradient computation process for every pixel iteratively is too slow. Instead, ❓it can be well translated into applying a convolution operator (kernel) on the entire image matrix ()

Common image processing kernels

⭐ Image Kernels explained visually ← visually để hiểu tác dụng của mấy cái kernel

Tool:
deepViz
← run locally to visualize CNN layers, kernels,….

She has an example of using kernel applying to black-white image ← For colored images, we just need to repeat the same process in each color channel respectively.

Histogram of Oriented Gradients (HOG)

Note: HOG (Histogram of Oriented Gradients)

Image Segmentation (Felzenszwalb’s Algorithm)

⭐ Check this article.

When images contains multiple objects → find region containing each object → Felzenszwalb algo. ← graph based approach

Find weights between 2 pixels → find similarity → more similarity must belong to the same components

→ more in the blog!

Problem Formulation

→ undirected graph

→ set of vertices / pixels in the image to be segmented

→ set of edges

Each edge has a weight denoting the dissimilarity between

→ is a segmentation of graph , divides into s.t. it contains distinct components/regions

Graph representation

Internal Difference

Giá trị lớn nhất của các edges (mà nối các vertices của C ấy, ko tính các edges nối với C khác!) với nhau. Ví dụ ở hình trên,

Int(C1) = max(MST(C1, E)) = max(2, 2, 0, 1, 2, 1) = 2

Int(C2) = 5

Int(C3) = max(20, 10) = 20

Component Difference

Giá trị nhỏ nhất các edges nối 2 components, nếu ko có edges →

Ví dụ ở hình trên,

Dif(C1, C2) = 24

Dif(C1, C3) = 78

Dif(C2, C3) = 55

Tiếp theo, ta tìm một criterion để phân biệt 2 components

Tuy nhiên, nếu áp dụng cách trên sẽ cho ra rất nhiều components nhỏ. Do đó ta cần sửa đôi chút chỗ điều kiện (thêm vào 1 hằn số ) như bên dưới (Minimum internal difference)

Minimum internal difference

Có 1 threshold cho sự khác giữa các components, nếu k lớn hơn thì sẽ cho ra một component lớn hơn.

Boundary Predicate

The quality of a segmentation is assessed

→ bigger , bigger components ← without this k, it makes the algorithm predict a lot of small components with small size (thậm chí chỉ là 1 pixel nếu như ID=0), in the extreme case if Internal Difference is 0, then the component becomes a single pixel.

Segmentation Algorithm

Check the same section in this article.

Selective Search

Xem trong bài báo.

Part 2 — CNN, DPM and Overfeat

⭐ [PDF] A guide to convolution arithmetic for deep learning + hình động của mấy hình. ← Tài liệu này giải thích về convolution, pooling, các thông số và các công thức arithmetic giữa các thông số (padding, strides,…)

CNN for Image Classification

Convolution Operation

No padding and 1x1 strides

1x1 border zeros padding and 2x2 strides.

👉 Note: ResNet (Residual Networks)

Evaluation Metrics: mAP

Deformable Parts Model (Felzenszwalb et al., 2010)

👉Note: Deformable Parts Model (Felzenszwalb et al., 2010)

Overfeat

☝

Cái này bả viết dễ hiểu hơn mấy nguồn tìm ngoài. Bả “tóm tắt” được nhất!

Overfeat is a pioneer (tiên phong) model of integrating the object detection, localization and classification tasks all into one convolutional neural network.

👉 Note: Overfeat in object detection

The Overfeat model architecture is very similar to AlexNet. It is trained as follows:

The training stages of the Overfeat model

Part 3 - R-CNN Family

👉 R-CNN & Fast R-CNN & Faster R-CNN & Mask R-CNN