“ES
∼
Dm [|F(A(S)) − FS(A(S))|] ≤ ε
gen
(m)”
Impact of Uniform Convergence on Learnability
Uniform convergence is considered applicable to learning problems, “if the
empirical risks of hypotheses in the hypothesis
class converge to their
population risk uniformly, with a distribution-independent rate”:
“sup
D
E
S
∼
Dm
[sup
h
∈
H
|F(h)−FS(h)|] − m→∞ → 0”
It is easy to demonstrate that an issue can be deemed learnable using the
"ERM learning rule" if uniform convergence holds
.
In 1971, Chervonenkis and Vapnik, demonstrated that “the
finiteness of a
straightforward
combinatorial
measure
known
as
the
VC
dimension
indicates
uniform
convergence,
for binary classification issues (where
Z =
X × {0, 1}, each hypothesis is a
mapping from
X to {0, 1}, and
f (
h; (
x,
y)) = 1
{
h(
x)̸ =
y}
)”. Also, it can be
confirmed that in a distribution-independent sense, problems regarding
binary classification with infinite “VC-dimension” can be
not learned. As a
necessary and sufficient condition for learning, this identifies the situation
of having finite “VC-dimension”, and therefore, uniform convergence.
This characterization is extensible to “regression”
techniques as well,
namely “regression with squared loss, where
h is now a real-valued
function, and
f (
h; (
x,
y)) = (
h(
x) −
y)
2
”. The property of having a “finite fat-
shattering dimension” on all finite scales can substitute for the property of
containing “finite VC dimensions”, but the basic equivalence still contains,
however, a problem can be learned only if there is a uniform convergence.
These findings are typically based on sensible reductions made to binary
classification. Even though, the “General Learning Setting” observed is not
as specific as the classification and regression, including scenarios where it
is difficult to reduce the classification to binary classification.
In 1998, Vapnik sought to depict that “in
the General Learning Setting,
learnability with the ERM learning rule is equivalent to uniform
convergence”, to bolster the need of uniform convergence in this setting
while noting that the result may not hold true to “trivial” situations.
Specifically, cases pertaining to “arbitrary learning problem with hypothesis
class
H and adding
H to a single hypothesis
h ̃ such that
f (h ̃, z) < inf h
∈
H
f (h, z) for all
z
∈
Z”, as shown in the picture below. This particular
problem of learning can be “trivially” learned using the “ERM
learning
rule” which always chooses “
h ̃”. Although, “H” can be an arbitrary
complex with no prior assumptions and uniform convergence. It must be
noted that this is not applicable to binary classification models, where “
f (h;
(x, y)) = 11
{h(x)̸ =y}
”, since on any “(
x,
y)” there will be hypotheses with “
f (
h;
(
x,
y)) =
f (
h ̃;(
x,
y))” and therefore, if “
H” is
highly complex with infinite
“VC dimensions then multiple hypotheses will have “0” empirical error on
any given training data set.
In order to remove such “trivial” scenario, the concept of “strict
consistency” was proposed by Vapnik, as an even stronger version of
consistency. It is defined with the equation below, where the convergence
lies within the probability.
“
∀
c
∈
R, inf
Do'stlaringiz bilan baham: