29
Accordingly, the efficiency and accuracy of such approaches are still far from
sufficient.
To overcome these drawbacks, a quick and reliable scene text detection and
localization method that has only two steps proposed. The proposed method uses
FCN model that instantly generates word or text-line level prophecies, apart from
unnecessary and heavy intermediate stages. Using convolutional
neural network
(CNN) is not an effective solution to determine the precise location of texts in the
image. Therefore, the use of the FCN model will yield higher accuracy results.
Figure 2. The proposed text detection and recognition methods and Uzbek
language TTS synthesizer
Furthermore, different circumstances must
be considered when creating
neural networks for text detection. Because the areas of text regions differ
remarkably, discovering the presence of long sentences would need features from
late-stage of a neural network, while predicting correct geometry surrounding a
short text regions demand low level knowledge in early stages. For that reason the
network must utilize features from various levels to satisfy these demands. The
proposed method slowly unites feature maps while
preserving the up sampling
features merging small. Simultaneously the method concludes with a network that
can both use various levels of features and retain a small calculation cost. The
model can be decayed into three parts: feature extractor, feature-merging and
output layer. The feature extractor might be a convolutional network pre-trained on
ImageNet dataset, along with interleaving convolution and pooling layers. Four
levels of feature maps, represented as , are obtained from the feature extractor,
whose sizes are
1/32
,
1/16
,
1/8
and
1/4
of
the input image, respectively.
Mathematically, feature-merging formulation expressed as:
=
(ℎ )
3
×
(ℎ ) = 4
(5)
ℎ =
= 1
×
(
×
(
;
)) ℎ
(6)
30
where is the merge base, and
ℎ
is the merged feature map, and the operator [;]
denotes concatenation with the channel axis.
In the next stages, once the text region is detected, the region can be cropped
and processed further to recognize the text. To do this, trained Tesseract OCR
model with Uzbek Latin and Cyrillic alphabet characters can be used. The
proposed method also includes recognized texts send to TTS synthesizer for Uzbek
language.
In this chapter of the dissertation, the result of studying and analyzing words
in the Uzbek dictionary, an electronic database of 31,5 thousand words
was formed
and arranged in alphabetical order. The Uzbek language speech synthesizer is
based on the concatenation method and contains pronunciation of the words.
Therefore, the Uzbek vocabulary with 31,5 thousand words were studied and all
words were broken down into 2,5 thousand sections, i.e. syllables.
For correct
pronouncing of recognized texts and update Uzbek language database, recognized
texts are compared with database, if recognized text is exist in Uzbek language
database system send it to Uzbek language TTS Synthesizer, else the word send to
language specialist to confirm new word.
In the fourth chapter of the dissertation,
Do'stlaringiz bilan baham: