Machine Learning: 2 Books in 1: Machine Learning for Beginners, Machine Learning Mathematics. An Introduction Guide to Understand Data Science Through the Business Application

Download 1,94 Mb.

Pdf ko'rish

bet	77/96
Sana	22.06.2022
Hajmi	1,94 Mb.
	#692449

1 ... 73 74 75 76 77 78 79 80 ... 96

Bog'liq
2021272010247334 5836879612033894610

Binary Classification: “Let ‘Z = X × {0,1}’, let ‘H’ be a set of functions
‘h: X → {0,1}’, and let ‘f (h; (x, y)) = 11
{h(x)̸ =y}
’. Here, ‘f (·)’ is simply the ‘0
− 1 loss function’, measuring whether the binary hypothesis ‘h(·)’
misclassified the example (x,y)”.
This setting’s ultimate goal is a selection of a hypothesis “h
∈
H” based on
a finite number of samples with the least amount of potential risk. In
general, we are expecting that the sample size will improve the
approximation of the risk. It is assumed that “learning guidelines that
enable us to choose such hypotheses are consistent”. In formal terms, we
conclude that “rule A” is consistent with rate “εcons(m)” under distribution
“D” if for all “m”, where “F* = inf
h
∈
H
F(h)”, the “rate” ε(m) is required to
be monotone decreasing with “εcons(m) −→ 0)”.
“ES
∼
Dm [F(A(S)) − F
∗
] ≤ εcons(m)”
We can not choose a "D-based" learning rule because "D" is unknown.
Rather, we need a “stronger requirement that the rule is consistent with rate

εcons(m) under all distributions D over Z”. The key definition is as follows:
“A learning problem is learnable, if there exist a learning rule A and a
monotonically decreasing m→∞ sequence εcons(m), such that εcons(m)
−→ 0, and
∀
D, ES
∼
Dm [F(A(S)) − F
∗
] ≤ εcons(m). A learning rule A for
which this holds is denoted as a universally consistent learning rule.”
The above definition of learnability, which requires a uniform rate of all
distributions, is the most appropriate concept to study learnability of
a hypothesis class. It is a direct generalization of “agnostic PAC-
learnability” to “Vapnik’s General Setting of Learning” as studied by
Haussler in 1992. A potential path to learning is a minimization of the
empirical risk “F
S
(h)” over a sample “S”, defined as
“FS ( h ) = 1/m ∑ f ( h ; z )”
Z,z
“Instance domain and a specific instance.”
H,h
“Hypothesis class and a specific hypothesis.”
f(h,z)
“Loss of hypothesis h on instance z.”
B
“suph,z | f (h; z)|”
D
“Underlying distribution on instance domain Z”
S
“Empirical sample z1,...,zm, sampled i.i.d. from D”
m
“Size of empirical sample S”
A(S)
“Learning rule A applied to empirical sample S”
εcons (m) “Rate of consistency for a learning rule”
F (h)
“Risk of hypothesis h, Ez
∼
D [ f (h; z)]”
“infh
∈
H F(h)”
“Empirical risk of hypothesis h on sample S, 1 ∑z
∈
S f (h; z) m”
“An ERM hypothesis, FS(hˆS) = infh
∈
H FS(h) Rate of AERM for a learning rule”
F
∗
FS ( h )
hˆ S
εerm (m)
εstable (m) “Rate of stability for a learning rule”
εgen (m) “Rate of generalization for a learning rule”

The “rule A” is an “Empirical Risk Minimizer” if it can minimize the
empirical risk
“F
S
(A(S)) = F
S
(hˆ
S
) = inf

Download 1,94 Mb.

Do'stlaringiz bilan baham:

1 ... 73 74 75 76 77 78 79 80 ... 96