Figure 4.3: Show us the result of mean average precision (mAP)
Method
Adidas
Corona
Google
Ritt/
huawei
Aldi
Dhl
Guinness
shell
apple erdi/
under
omur
Hein
sing
becks esso
Hp
Starbucks
Bmw
Fedex
Milka
Stella
artois
Carls
berg
Ferrari
Nvidia
texaco
Chimay
Ford
Paulaner
tsingtao
Cococola
Fosters
Pepsi
ups
mAP
FRCN +AlexNet
47.8
90.9
90.9
63.3
69.1
56.9
81.0
50.6
68.6
83.6
66.5
83.5
71.7
89.4
N/A
99.3
81.7
70.8
46.6
89.9
59.1
88.3
52.4
81.5
70.9
85.7
98.0
86.7
58.8
86.0
42.0
70.5
73.5
FRCN +
VGG
61.6
92.9
85.2
63.0
67.2
53.5
89.4
57.4
84.9
80.1
57.8
94.2
72.5
88.8
N/A
95.9
70.0
61.3
34.6
82.2
49.6
90.0
50.3
84.4
71.9
84.2
98.6
84.3
33.0
49.7
34.2
81.5
74.4
YOLOv2-Tiny
0.79
0.81
0.92
0.94
0.91
0.77
0.85
0.78
0.78
0.80
N/A
N/A
0.93
0.85
0.81
0.94
0.82
0.77
0.74
0.92
0.85
0.77
0.78
0.87
0.79
0.80
0.83
0.87
0.81
0.83
0.72
0.74
82.5
Table 4. 1:: Experimental result of each classes using FlickrLogos-32 and comparation with previous works [18].
Here, first two method [18] They use FRCN with region proposals. For both training and testing, FRCN takes in raw images and region proposals of significant features inside of those images in the form of bounding boxes. This is the localization step. It then classifies each bounding box region proposal as a type of logo or as “background," meaning that the area in the proposal is not a logo at all. If the region proposal contains a logo, it also outputs a bounding box regression offset that adjusts the region proposal to more closely highlight the region containing the logo Detection exhibits fairly good detection performance, especially on distinctive logos such as that of huawei with 94%. You only look once (YOLO) is a really fast real-time object detection system. With my TensorFlow model On a Nvidia GeForce 1070, it processes in real-time at 30-35 FPS (frames per second) with a mAP (mean average precision) of 82% and it can track logos very smoothly. In mobile android phones (Honor 9) we have made the process result as shown in Figure 4.5 by conducting a series of experiments, the quantitative performance measure of logo detection.
Figure 4.4: Shows the logo detection through Honor 9
Training darkflow and our custom CNN architecture took an immense amount of time. We trained our models in batches of 64 in 8 mini-batches. This allowed us to efficiently train 64 images every step. When training on a Nvidia GeForce 1070, each step took 2 seconds. This allowed us to train each model for 2000 epochs, so we can observe the early stopping point and the weights that gave us the best accuracies. YOLO’s implementation allowed us to save our weight files every 10000 steps, so we just let it continually train overnight so we can scrap the accuracy in the morning using a script. We have significant results that show our model works better with our dataset above with a little less than 2000 epochs. Note that our CNN actually changes learning rate every 10000 steps, which begs the question why we did not check our IoU and mAP after that. We actually tested it, and we gave the result at 1500 epochs because that is where out IoU and mAP converges, so after that many epochs not much more learning is gained, even though the loss was still decreasing at a measly rate. We trained up to 2000 epochs and the accuracy peaked at epoch 1500. We experimented with running different learning rates our accuracy never got any better.