Keywords: object detection, Convolutional Neural Network (cnn), You Only Look Once (yolo), Faster r-cnn (Region-based Convolutional Neural Networks, ssd



Download 1,47 Mb.
bet4/8
Sana18.04.2022
Hajmi1,47 Mb.
#561971
1   2   3   4   5   6   7   8
Bog'liq
YOLO


Each grid cell has depth of D. The value of D depends on the number of classes we want to detect. When we have C classes of object, D is

D=B(5+C) (3.1)


The output of the network looks like this. There are 13x13 = 169 grid cells in total, and each grid cell can detect up to B bounding boxes. One bounding box has 5 + C properties, therefore a grid cell has D = Bx(5+C) values (this is depth).


Tensor=SxSxBx(5+C) (3.2)


Figure 3.3: This 13x13 tensor can be considered as a 13x13 grid representing the input image, where each cell of this tensor will hold the 5 box definitions and 30 class probabilities.
In our case classes number C=30 and B=5
The logic is that if there was an object on that cell, we define which object by using the biggest class probability value from that cell.

Figure 3.4: Yolov2-tiny for each grid has B=5 bounding box



  • x: x coordinate of the box center

  • y: y coordinate of the box center

  • w: box width

  • h: box height

  • P(obj): probability that an object exists in this box

Each grid cell is able to predict B bounding boxes. Since each bounding box prediction is composed of 5 + C values, the total length of predicted values on one grid cell is B*(5+C). I will consider the case when B = 5 and C = 30, so one grid cell has length D = 175.
Note that x, y, w and h are not in ‘pixels’ since images on which we apply object detection do not have the same size. For example, one image may have size 1080x1920x3, while another may have 2160x4096x3 (where 3 is for RGB). Therefore, before we feed images to the network, we reshape them into 416x416x3 images such that they have the same size. I’ll show you, later, how the actual output looks like. For now, we don’t have to care them exactly.



Figure 3.5: B bounding boxes a grid cell predicts
This image represents one of the B bounding boxes a grid cell predicts. The first 5 values are fixed while C varies depending on the number of classes. P(obj) x Cᵢ becomes the probability that an object of the i-th class exists in this bounding box.

C represents conditional probabilities that, given an object exits in the box, the object belongs to a specific class:




Cᵢ = P (the obj belongs i-th class | an obj exists in this box) (3.3)

w here .


So the probability that an object of the i-th class is given by:
If this value is greater than a threshold, we think that the network predicted that an object of the i-th class exists in this bounding box.
3.3. Network Architecture
The input to the network is 416x416x3 image in YOLOv2-tiny. There is no fully connected layer in it.

Layer



kernel

Stride/Filters

Output shape

Input







416x416x3

Convolution

3×3

1/16

416x416x16

MaxPooling

2×2

2

208x208x16

Convolution

3×3

1/32

208x208x32

MaxPooling

2×2

2

104x104x32

Convolution

3×3

1/64

104x104x64

MaxPooling

2×2

2

52x52x64

Convolution

3×3

1/128

52x52x128

MaxPooling

2×2

2

26x26x128

Convolution

3×3

1/256

26x26x256

MaxPooling

2×2

2

13x13x256

Convolution

3×3

1/512

13x13x512

MaxPooling

2×2

1

13x13x1024

Convolution

3×3

1/1024

13x13x1024

Convolution

3×3

1

13x13x1024

Convolution

1×1

1/175

13x13x175


Table 3.1: Details of Network
Chapter4 Experimental Results
4.1. Dataset

In our project we used FlickrLogos-32 dataset. The FlickrLogos-32 dataset contains photos showing brand logos and is meant for the valuation of multi-class logo recognition as well as logo retrieval methods on real-world images. Logos of 32 different logo classes and 6000 negative images were collected by downloading them from Flickr. The dataset includes images, ground truth, annotations (bounding boxes plus binary masks), evaluation scripts and pre-computed visual features. The dataset FlickrLogos-32 contains photos depicting logos and is meant for the evaluation of multi-class logo detection/recognition as well as logo retrieval methods on real-world images.



Download 1,47 Mb.

Do'stlaringiz bilan baham:
1   2   3   4   5   6   7   8




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish