210
◾
Linux with Operating System Concepts
is not a blank, and so
[^ ]
matches the string. To express that the string should contain no
blanks, we need to first indicate that we are matching the entire string using
^
and
$
, and
then we have to indicate that there can be many nonblanks using either
*
or
+
. We could
solve this
problem through the regex
^[^ ]
+
$
. This states that the entire string, from start
to finish, must contain characters that are not blanks. If you want to also match the empty
string, use
^[^ ]*$
.
6.2.6 Matching Metacharacters Literally
All of our metacharacters are punctuation marks. What happens when the regular expres-
sion requires one of these punctuation marks as a literal character? For instance, you want
to match a number that contains a decimal point. Your first inclination might be to express
this as
[0-9]
+
.[0-9]
+
, meaning that the number has some digits followed by a period fol-
lowed by some digits. But recall that the decimal point or period (.) means “any character.”
This would cause the
regular expression to match
123.567
but also
123a567
as well as
1234567
where the period matches the ‘a’ and ‘4’ respectively. To match a punctuation
mark that is one of the metacharacters, we could use
[[:punct:]]
, but this would match
any punctuation mark, not the one(s) we expect.
There are two solutions to this problem. First, we could specify the punctuation mark
inside of
[]
as in
[.]
. This will work for most punctuation marks although it would not
work for
[
or
].
Alternatively, we can specify that the punctuation mark should be treated
literally and not as a metacharacter. To accomplish this, we precede the punctuation mark
with a
\
character. The
\
means “treat the next character literally,” or “escape the meaning
of the next character.” This is why the
\
is known as an
escape
character. Using
\
, we see
that
[0-9]
+
\.[0-9]
+
forces the period to be treated literally rather than as a metacharac-
ter. So this revised regex will match any sequence of one or more
digits followed by a period
followed by one or more digits.
For another example, let us define an expression to match a simple multiplication prob-
lem of the form
3*4
=
12
. Recall that the
*
means “0 or more occurrences,” but here we
would want the
*
to be treated as a literal character. We could define our regular expres-
sion in a couple of different ways:
[0-9]
+
[*][0-9]
+=
[0-9]
+
[0-9]
+
\*[0-9]
+=
[0-9]
+
The above expressions define a pattern of “at least one digit followed by an asterisk fol-
lowed by at least one digit followed by an equal sign followed by at least one digit.” Notice
that there is no way to ensure that the multiplication problem is correct, so this expression
can match
3*4
=
7
as well as
3*40
=
12
. The following regular expression would not work
because the * is being used to modify the
+
, which is not legal syntax.
[0-9]
+
*[0-9]
+=
[0-9]
+
NOTE: since
=
is not a metacharacter, we do not need to precede it with
\
although
\
=
is equivalent to
=
(i.e., the
\
does not impact the
=
)
.
Regular Expressions
◾
211
We
can omit the
\
when expressing some of the metacharacters under limited
circumstances. Since the
$
is used to indicate “ends the expression,” if a
$
does not
appear at the end of the regular expression, it is treated literally and therefore does not
require the
\
. Similarly, the
^
is expected to appear in only two positions, at the begin-
ning of an expression or as the first character inside the
[]
. If a
^
appears anywhere
else, it is treated literally. On the other hand,
the characters
[
and ] must be preceded
by the
\
if you wish to treat them literally. This is unlike other metacharacters which
could be expressed in
[]
because we would not be able to specify
[[]]
to indicate literal
[
or ] marks.
6.2.7 Controlling Repetition
Through the use of
*
and
+
, we can specify repetition of one or more characters. But
*
and
+
may not be precise enough for circumstances where we want to
limit
the number
of repetitions. To control the number of repetitions expected, we use an additional set of
metacharacters,
{}
, sometimes called the curly brackets or curly braces. There are three
ways that we can apply the curly brackets as shown below. In each, n and
m are integer
values.
•
{
Do'stlaringiz bilan baham: