Machine Learning: 2 Books in 1: Machine Learning for Beginners, Machine Learning Mathematics. An Introduction Guide to Understand Data Science Through the Business Application

Download 1,94 Mb.

Pdf ko'rish

bet	78/96
Sana	22.06.2022
Hajmi	1,94 Mb.
	#692449

1 ... 74 75 76 77 78 79 80 81 ... 96

Bog'liq
2021272010247334 5836879612033894610

h
∈
H
F
S
(h)”
where “FS(hˆS) = infh
∈
H FS(h)” is referred to as the “minimal empirical
risk”. Given the odds that multiple hypotheses minimize the empirical risk,
“hˆ
S
” does not pertain to a certain hypothesis and there could potentially be
multiple rules which are all “ERM”.
“Rule A” can, therefore, be concluded to be an “AERM (Asymptotic
Empirical Risk Minimizer) with rate ε
erm
(m) under distribution D”, when:
“ES
∼
Dm [FS(A(S))−FS(hˆS)]≤ ε
erm
(m)”
A learning rule can be considered an “AERM universally” with “rate
ε
erm
(m)” if it is an AERM with “rate ε
erm
(m)” under all distributions “D”
over “Z”. A learning rule can be considered “always AERM” with “rate
ε
erm
(m)”, if for any “S” sample of “m”, size it holds that “FS(A(S))−FS(hˆS)
≤ ε
erm
(m)”.
It can be concluded that “rule A” generalizes with rate “ε
gen
(m)” under
distribution D if for all m, where A rule “universally generalizes with rate
ε
gen
(m) if it generalizes with rate ε
gen
(m) under all distributions D over Z”.

“ES
∼
Dm [|F(A(S)) − FS(A(S))|] ≤ ε
gen
(m)”

Impact of Uniform Convergence on Learnability
Uniform convergence is considered applicable to learning problems, “if the
empirical risks of hypotheses in the hypothesis class converge to their
population risk uniformly, with a distribution-independent rate”:
“sup
D
E
S
∼
Dm
[sup
h
∈
H
|F(h)−FS(h)|] − m→∞ → 0”
It is easy to demonstrate that an issue can be deemed learnable using the
"ERM learning rule" if uniform convergence holds.
In 1971, Chervonenkis and Vapnik, demonstrated that “the finiteness of a
straightforward
combinatorial
measure
known
as
the
VC
dimension
indicates
uniform
convergence,
for binary classification issues (where Z = X × {0, 1}, each hypothesis is a
mapping from X to {0, 1}, and f (h; (x, y)) = 1
{h(x)̸ =y}
)”. Also, it can be
confirmed that in a distribution-independent sense, problems regarding
binary classification with infinite “VC-dimension” can be not learned. As a
necessary and sufficient condition for learning, this identifies the situation
of having finite “VC-dimension”, and therefore, uniform convergence.
This characterization is extensible to “regression” techniques as well,
namely “regression with squared loss, where h is now a real-valued
function, and f (h; (x, y)) = (h(x) − y)
2
”. The property of having a “finite fat-
shattering dimension” on all finite scales can substitute for the property of

containing “finite VC dimensions”, but the basic equivalence still contains,
however, a problem can be learned only if there is a uniform convergence.
These findings are typically based on sensible reductions made to binary
classification. Even though, the “General Learning Setting” observed is not
as specific as the classification and regression, including scenarios where it
is difficult to reduce the classification to binary classification.
In 1998, Vapnik sought to depict that “in the General Learning Setting,
learnability with the ERM learning rule is equivalent to uniform
convergence”, to bolster the need of uniform convergence in this setting
while noting that the result may not hold true to “trivial” situations.
Specifically, cases pertaining to “arbitrary learning problem with hypothesis
class H and adding H to a single hypothesis h ̃ such that f (h ̃, z) < inf h
∈
H
f (h, z) for all z
∈
Z”, as shown in the picture below. This particular
problem of learning can be “trivially” learned using the “ERM learning
rule” which always chooses “h ̃”. Although, “H” can be an arbitrary
complex with no prior assumptions and uniform convergence. It must be
noted that this is not applicable to binary classification models, where “f (h;
(x, y)) = 11
{h(x)̸ =y}
”, since on any “(x,y)” there will be hypotheses with “f (h;
(x,y)) = f (h ̃;(x,y))” and therefore, if “H” is highly complex with infinite
“VC dimensions then multiple hypotheses will have “0” empirical error on
any given training data set.

In order to remove such “trivial” scenario, the concept of “strict
consistency” was proposed by Vapnik, as an even stronger version of
consistency. It is defined with the equation below, where the convergence
lies within the probability.
“
∀
c
∈
R, inf

Download 1,94 Mb.

Do'stlaringiz bilan baham:

1 ... 74 75 76 77 78 79 80 81 ... 96