CONDITIONAL RANDOM FIELD PREDICTION MODEL FOR
SEGMENTING BREAST MASSES FROM DIGITAL MAMMOGRAMS
Fazilov Shavkat Khayrullaevich
I
, Abdieva Khabiba Sobirovna
II
I
Professor, Research Institute for Development of Digital Technologies and
Artificial Intelligence,
sh.fazilov@mail.ru
.
II
PhD student, Department of Software Engineering, Samarkand State
University,
orif.habiba1994@gmail.com
.
Abstract.
The segmentation of masses from mammograms is a difficult
challenge due to their variety in shape, appearance, and size, as well as their low
signal-to-noise ratio..
Key words:
segmentation, CRF, prediction model, mammogram, computer-
aided diagnosis systems
Assume that we have an annotated dataset
D
including images of the region
of interest (ROI) of the mass, represented by
𝑥: 𝛺 → 𝑅(𝛺 ∈ 𝑅
2
)
, and the
respectively manually provided segmentation mask
𝑦: 𝛺 →
{-1,+1}, where
D
=
(𝑥, 𝑦)
𝑖=1
|𝐷|
. Also assume that the parameter of our structured output prediction
model is denoted by
𝜃
and the graph
𝐺 = (𝑉, 𝐸)
links the image
𝑥
and labels
𝑦
,
where
𝑉
is the set of graph nodes and
𝐸
, the set of graph edges. The process of
learning the parameter of structured prediction mode is done through the
minimization of the following empirical loss function:
𝜃
∗
= 𝑎𝑟𝑔 min
𝜃
1
|𝐷|
∑
𝑙(𝑥
𝑖
, 𝑦
𝑖
, 𝜃)
𝐷
𝑖=1
, (1)
here
𝑙(𝑥, 𝑦, 𝜃)
is a continuous and convex loss function being minimized that
defines the structured model. We use CRF formulation for solving (1). In particular,
the CRF formulation uses the loss:
𝑙(𝑥
𝑖
, 𝑦
𝑖
, 𝜃) = 𝐴(𝑥
𝑖
, 𝜃) − 𝐸(𝑦
𝑖
, 𝑥
𝑖
; 𝜃)
,
(2)
118
here
𝐴(𝑥, 𝜃) = 𝑙𝑜𝑔 ∑
exp {𝐸(𝑦, 𝑥; 𝜃)}
𝑦∈{−1,+1}
|𝛺|×|𝛺|
is the log-partition
function that ensures normalization, and
𝐸(𝑦, 𝑥; 𝜃) = ∑
∑
𝜃
1,𝑘
𝜓
(1,𝑘)
(𝑦(𝑖), 𝑥)
𝑖∈𝑉
+
𝐾
𝑘=1
∑
∑
𝜃
2,𝑘
𝜓
(2,𝑘)
(𝑦(𝑖), 𝑦(𝑗), 𝑥)
𝑖,𝑗∈𝐸
𝐿
𝑙=1
, (3)
In(3)
𝜓
(1,𝑘)
(. , . )
denotes one of the
K
potential functions between label and
pixel nodes,
𝜓
(2,𝑘)
(. , . )
representing one of the L potential functions on the edges
between label nodes,
𝜃 = [𝜃
1,1
, … 𝜃
1,𝐾
, … 𝜃
2,𝐿
]
𝑇
∈ 𝑅
𝐾+𝐿
, and
𝑦(𝑖)
being the
𝑖
𝑡ℎ
component of vector
y.
Conditional Random Field. The solution of (1) using the CRF loss function in
(2) involves the computation of the log-partition function
𝐴(𝑥, 𝜃)
.The tree re-
weighted belief propagation algorithm provides the following upper bound to this
log-partition function:
𝐴(𝑥, 𝜃) = max
𝜇∈𝑀
𝜃
𝑡
𝜇 + 𝐻(𝜇)
, (4)
here
𝑀 = {𝜇́: ∃𝜃, 𝜇́ = 𝜇}
denotes
the
marginal
polytope,
𝜇 =
∑
P(𝑦|𝑥, 𝜃)𝑓(𝑦)
𝑦∈{−1,+1}
|𝛺|×|𝛺|
, with
𝑓(𝑦)
denoting the set of indicator functions of
possible configurations of each clique and variable in the graph(as denoted in(3)),
P(𝑦|𝑥, 𝜃) = exp {𝐸(𝑦, 𝑥; 𝜃) − 𝐴(𝑥, 𝜃)}
indicating the conditional probability of the
annotation
y
given the image
x
and parameters
𝜃
(where we assume that this
conditional probability function belongs to the exponential family ), and
𝐻(𝜇) =
− ∑
P(𝑦|𝑥, 𝜃)𝑙𝑜𝑔P(𝑦|𝑥, 𝜃)
𝑦∈{−1,+1}
|𝛺|×|𝛺|
is the entropy.
It is important to note that for general graphs with cycles (as in this study),
the marginal polytope
M
is difficult to describe and the entropy
𝐻(𝜇)
is not tractable.
TRW addresses these concerns by first replacing the marginal polytope with a
superset
L
⊃
M
that only accounts for the marginals local constraints, and then
approximating the entropy calculation with an upper bound. Specifically,
𝐿 = {𝜇: ∑
𝜇(𝑦(𝑐)) =
𝑦(𝑐|𝑦(𝑖)
𝜇(𝑦(𝑖)), ∑
𝜇(𝑦(𝑖)) = 1
𝑦(𝑖)
}
, (5)
replaces
M
in (5) and represents the local polytope (with
𝜇(𝑦(𝑖)) =
∑ 𝑃(𝑦́|𝑥), 𝜃)𝛿(𝑦́(𝑖) − 𝑦(𝑖))
𝑦́
and
𝛿(. )
is the Dirac delta function),
c
indexes a graph
clique, and the entropy approximation(that replaces
𝐻(𝜇)
in (5) is defined by)
𝐻
̃
(
𝜇
)=
∑
𝐻 (𝜇(𝑦(𝑖))) − ∑
𝐼(𝜇(𝑦(𝑐)))
𝑦(𝑐)
𝑦(𝑖)
, (6)
where
𝐻 (𝜇(𝑦(𝑖)))
=-
∑
𝜇(𝑦(𝑖))𝑙𝑜𝑔𝜇(𝑦(𝑖))
𝑠(𝑖)
is the univariate entropy of
variable
y(i)
,
𝐼 (𝜇(𝑦(𝑐))) = ∑
𝜇(𝑦(𝑖))𝑙𝑜𝑔
𝜇(𝑦(𝑐))
∏
𝜇(𝑦(𝑖))
𝑖∈𝑐
𝑦(𝑐)
is the mutual information
of the cliques in our model, and
𝜌
𝑐
is a free parameter providing the upper bound on
the entropy. Therefore, the estimation of
𝐴(𝑥; 𝜃)
and associated marginal in (5) is
based on the following message-passing updates:
119
𝑚
𝑐
∞
∑
exp {
1
𝜌
𝑐
𝜓
𝑐
(𝑦(𝑖), 𝑦(𝑗); 𝜃)}
𝑦(𝑐)\𝑦(𝑖)
∏ exp {
1
𝜌
𝑐
𝜓
𝑖
(𝑦(𝑖), 𝑥; 𝜃)}
𝑗∈𝑐\𝑖
∏
𝑚
𝑑
(𝑠(𝑗))
𝜌𝑑
𝑑;𝑗∈𝑑
𝑚
𝑐
(𝑠(𝑗))
(7)
where
𝜙
𝑖
(𝑦(𝑖), 𝑥; 𝜃) = ∑
𝑤
1,𝑘
𝜓
(1,𝑘)
(𝑦(𝑖), 𝑥)
𝐾
𝑘=1
and
ψ
𝑐
(𝑦(𝑖), 𝑦(𝑗); 𝜃) =
∑
𝑤
2,𝑘
𝜓
(2,𝑘)
(𝑦(𝑖), 𝑦(𝑗), 𝑥)
𝐿
𝑙=1
. When the message-passing algorithm converges, the
beliefs for the associated marginals are expressed as follows:
𝜇
𝑐
(𝑦(𝑐))∞
1
𝜌
𝑐
𝜓
𝑐
(𝑦(𝑖), 𝑦(𝑗)) ∏ 𝜓
𝑖
(𝑦(𝑖), 𝑥; 𝜃)}
𝑖∈𝑐
∏
𝑚
𝑑
(𝑦(𝑗))
𝜌
𝑑
𝑑;𝑗∈𝑑
𝑚
𝑐
(𝑦(𝑖))
𝜇
𝑖
(𝑦
𝑖
)∞exp (𝜓
𝑖
(𝑦(𝑖), 𝑥; 𝜃)) ∏
𝑚
𝑑
(𝑦(𝑖))
𝜌
𝑑
𝑑;𝑖∈𝑑
, (8)
The learning process involved in the assessment of
θ
is typically based on
gradient descent that minimizes the loss in (2) and should run until convergence,
which is characterized by the change rate of
θ
between successive gradient descent
iterations.
Do'stlaringiz bilan baham: |