Machine Learning: 2 Books in 1: Machine Learning for Beginners, Machine Learning Mathematics. An Introduction Guide to Understand Data Science Through the Business Application

Download 1,94 Mb.

Pdf ko'rish

bet	81/96
Sana	22.06.2022
Hajmi	1,94 Mb.
	#692449

1 ... 77 78 79 80 81 82 83 84 ... 96

Bog'liq
2021272010247334 5836879612033894610

1
, α
1
),...,(x
m
, α
m
)” if “d > 2
m
” then that is a result of
probability greater than “1 − e
−1
> 0.63” and a coordinate “j
∈
1 . . . d” is

present such that all “confidence vectors α
i
” in the sample are “0” on the
coordinate “j”, i.e. “αi[j] = 0” for all “i = 1..m”. Assume “ej
∈
H” is the
“standard basis vector corresponding to this coordinate”. Then in the
equation shown in the picture below, “F
S
(3)
(·)" represents the empirical risk
concerning the function “f
(3)
(·)”.
In another scenario, if “F
S
(3)
(·)” denotes the actual risk for the function “f
(3)
(·)”, the equation shown in the picture below is obtained.
Thus, for any sample size “m”, a convex “Lipschitz-continuous objective”
can be constructed in a dimension that is high enough so as to ensure that
with minimum “0.63 probability” over the sample, “sup
h
|F
(3)
(h)−F
(3)
(h)| ≥
½”. In addition, since “f (·; ·)” is non-negative, “e
j
” can be denoted as an
“empirical minimizer”, even though its expected value “F
(3)
(e
j
) = ½” is not
at all close to the optimal expected value “min
h
F
(3)
(h) = F
(3)
(0) = 0”.
To explain this case with an approach that is not dependent on the sample-
size, assume “H is the unit sphere of an infinite-dimensional Hilbert space
with orthonormal basis e1, e2,..., where for v
∈
H, we refer to its
coordinates v[ j] = <v, e
j
>” with respect to this basis”. The “confidences α”

serve as a map of every single coordinate to “[0, 1]”. This means, an
“infinite sequence of reals in [0, 1]”. The operation of the product according
to the elements, “α
∗
v” is defined on the basis of this mapping and the
objective function “f
(3)
(·)” of the equation (shown in the first picture of this
example) can be easily defined in this infinite-dimensional space.
Let us now reconsider the distribution over “z = (x, α)” where “x = 0” and
“α” is an infinite independent and identically distributed sequence of
“uniform Bernoulli random variables” (that is, a “Bernoulli process with
each α
i
uniform over {0, 1} and independent of all other α
j
”). It can be
implied that for any finite sample there is high likelihood of finding a
coordinate “j” with “α
i
[j] = 0” for all “I”, and therefore, an empirical
minimizer “F
S
(3)
(e
j
) = 0” with “F
(3)
(e
j
) = 1/2 > 0 = F
(3)
(0)” can be obtained.
Consequently, it can be observed that the empirical values “F
S
(3)
(h)” are
not uniform while converging as expected, and empirical minimization does
not guarantee a solution to the learning problem. Furthermore, one could
potentially generate a sharper counter-example, wherein the “unique
empirical minimizer h
ˆ
S
” is nowhere close to the optimal expected value. In
order to accomplish this, “f
(3)
(·)” must be augmented with the use of “a
small term which ensures its empirical minimizer is unique, and not too
close to the origin”. Considering the equation below where “ε = 0.01”.
“f
(4)
(h;(x,α)) = f
(3)
(h;(x,α))+ε∑2−i(h[i]−1)
2
”
The objective continues to be convex and “(1 + ε)” is still “Lipschitz”. In
addition, since the added term is strictly convex, the “f
(4)
(h;z)” will also be

strictly convex with respect to “h” and that is the reason for the empirical
minimizer being unique.
Considering the same distribution over “z: x = 0” while “α[i]” are
independent and identically distributed uniform 0 or 1. The minimizer of
“F
S
(4)
(h)” is referred to as the empirical minimizer which is subjected to
the constraints “|h| ≤ 1”. The good news is that although the identification
of the solution for such a constrained optimization problem is complicated,
it is not mandatory. It is sufficient to depict that “the optimum of the
unconstrained optimization problem h
∗
UC
= arg minF
S
(4)
(h) (with no
constraining h
∈
H ) has norm |h
∗
UC
| ≥ 1”.
It should be noted that “in the unconstrained problem, wherein α
i
[j] = 0 for
all i = 1...n, only the second term of f
(4)
depends on h[ j] and we have h
∗
UC
[
j] = 1”. As it could happen for certain coordinate “j”, it can be concluded
that “the solution to the constrained optimization problem lies on the
boundary of H , that is |hˆ S |= 1”, which can be represented by the
equation shown in the picture below while “F
∗
≤ F(0) = ε”.

Download 1,94 Mb.

Do'stlaringiz bilan baham:

1 ... 77 78 79 80 81 82 83 84 ... 96