Complexity in phonetics and phonology
25
ness is information content, a measure of the probability of a particular
element in a given communication system. The higher
the probability of an
element the lower its information content, and conversely, the lower its
probability the higher its information content. Markedness diagnostics can
thus be replaced by observations about probability, which can be deter-
mined based on a number of factors.
2
While the exact nature of these fac-
tors, their interaction, and the specific definition of
probability require fur-
ther empirical investigation, it is plausible to hypothesize a relationship
between complexity and probability. For example, if low probability corre-
lates with higher information content, then it may in turn correlate with
higher complexity. At the same time, a related hypothesis needs to be
tested, one signalled by Pellegrino et al. (2007): it is possible that informa-
tion rate (the quantity of information per unit per second)
may turn out to
be more relevant than, or closely related to information content (the quan-
tity of information per unit).
2.2.
Theory-driven vs. data-driven approaches
Overall we identify two main types of studies of phonological complexity,
which we refer to as theory-driven and data-driven, respectively.
The theory-driven approach is well illustrated by Chomsky and Halle’s
(1968)
SPE
, where counting distinctive features is considered to be the
relevant measure of complexity, not unlike Lehmann’s (1974) proposal,
albeit restricted to phonology. In chapter 9 of
SPE
, Chomsky and Halle
develop a complexity metric. Starting from the assumption that a natural
class should be defined with fewer distinctive features than a non-natural
(or less natural) class, Chomsky and Halle observe some contradictions.
For example, the class of voiced obstruents is captured
by more features
than the class of all voiced segments, including vowels. Nevertheless, the
first class is intuitively more natural than the second one, and would there-
fore be expected to have the simpler definition. The solution they propose
is to include the concept of markedness in the formal framework, and to
“revise the evaluation measure so that unmarked values do not contribute to
complexity” (Chomsky and Halle, 1968:402).
This adjustment allows them
to define complexity, and more specifically the complexity of a segment
inventory, in the following way: “The complexity of a system is equal to
the sum of the marked features of its members” (Chomsky and Halle,
1968:409), or in other words, “related to the sum of the complexities of the