248
PRELIMINARY PROCESSING OF SPEECH SIGNALS BASED ON PERSPECTIVE
ANALYSIS
A.Sh.Mukhamadiev (TUIT, Head of the department),
Sh.E.Tashmetov (TUIT, student)
In recent decades, there has been emphasized a sharp increase in interest in the development
and use of methods and algorithms of processing speech signals [1,2]. One of the most interesting
areas in the field of processing speech signals is the problem of speech recognition. In recent years,
speech recognition methods have been widely used in daily life (for example, when processing credit
card numbers and other access codes in computer-based systems which process mobile data). It should
be noted that the system of automatic recognition of speech images has received very intensive
development in the foreign literature in recent years. However, the issues of automatic recognition of
speech patterns based on the uzbek language are not sufficiently developed.
The purpose of this report is to develop an algorithm for perceptual analysis of speech signals.
It should be noted that the considered algorithm is the initial stage in solving the problem of isolating
the characteristics characterizing the speech signal.
It is known that the definition of the absolute threshold of audibility is the first stage of
perceptual analysis of speech and allows you to eliminate information from the speech signal that is
not perceived by the human ear, and therefore not having value for analysis.
Consider the discretized speech signal S(n). The processing algorithm for perceptual signal
processing S(n) can be represented in five consecutive stages.
In the first stage, the input samples S(n) are normalized in accordance with the window length
N and the number of bits b used for analysis and represent the individual sample:
N
n
S
n
x
b
1
2
)
(
)
(
.
Further, the input signal x(n) is segmented into sections of length N with some overlap, and then
the power spectral density р(k)
N
k
1
0
[3].
At the second stage, tonal and noise masking signals are determined. After carrying out the
estimation of the power spectral density and normalization of the sound pressure level, it is necessary
to isolate the tonal and non-tonal masking components of the signal. The local maximum within the
sample, exceeding the neighboring components by 7 dB, is considered a tonal masking signal.
In the third stage, the number of masking signals is reduced by applying two criteria. Using the
first criterion, tonal and noise masking signals that are below the absolute threshold are discarded.
The second criterion uses a sliding "0.5 bark window" with which any pair of masking signals, located
at a distance less than 0.5 bar, is replaced by a stronger one.
In the fourth step, separate masking thresholds are calculated. After receiving "thinned" sets of
tonal and noise masking signals, you can calculate individual tonal and noise masking thresholds.
Each individual threshold represents the masking contribution of the frequency reference to the tonal
or noise masking signal.
In the fifth stage, global masking thresholds are defined. At this stage, individual masking
thresholds are combined in a certain way to find the global threshold for each set of frequencies in the
subset. The global masking threshold for each frequency sample is a signal-dependent additive
combination of the absolute threshold powers due to the distribution along the membrane of the tonal
and noise masking signals determined from the power spectrum.
Thus, we defined an algorithm for perceptual analysis of speech signals. To evaluate the
efficiency of the considered algorithm, experimental studies were carried out with the separation of
signs of speech signals. The results of these studies showed the operability of the developed algorithm
for solving the problem of isolating the signs of speech signals.
In conclusion, it should be noted that the considered algorithm can be used to compile various
software complexes which are oriented toward solving problems of recognizing isolated words.
Do'stlaringiz bilan baham: