### 卷積后的輸出

You expect each cell of the feature map to predict an object through one of it's bounding boxes if the center of the object falls in the receptive field of that cell. (Receptive field is the region of the input image visible to the cell. Refer to the link on convolutional neural networks for further clarification).

feature map的size為N*N*Depth,其中Depth=(B x (5 + C))

B指每個cell預測幾個boundingbox. 5=4+1. 4代表用於預測boudingbox的四個值,1代表object score,代表這個boundingbox包含目標的概率,C代表要預測的類別個數.

### 如何計算predicted box的坐標

#### Anchor Boxes

anchor box是事先聚類出來的一組值.可以理解為最接近現實的object的寬,高.
yolov3中feature map的每一個cell都預測出3個bounding box.但是只選用與ground truth box的IOU最大的做預測.

#### 預測

bx, by, bw, bh are the x,y center co-ordinates, width and height of our prediction. tx, ty, tw, th is what the network outputs. cx and cy are the top-left co-ordinates of the grid. pw and ph are anchors dimensions for the box.

• bx by bw bh是預測值 代表預測的bouding box的中心點坐標 寬 高
• tx, ty, tw, th 是卷積得到的feature map在depth方向的值
• cx,cy是當前cell左上角坐標
• pw,ph是事先聚類得到的anchors值

### 多尺度檢測

yolov3借鑒了特征金字塔的概念,引入了多尺度檢測,使得對小目標檢測效果更好.

nms解釋看下這個https://blog.csdn.net/zchang81/article/details/70211851.