Linux with Operating System Concepts

Download 5,65 Mb.

Pdf ko'rish

bet	81/254
Sana	22.07.2022
Hajmi	5,65 Mb.
	#840170

1 ... 77 78 79 80 81 82 83 84 ... 254

Bog'liq
Linux-with-Operating-System-Concepts-Fox-Richard-CRC-Press-2014

empty string
. Thus, this regex only matches the empty string.
6.2.4 Matching from a List of Options
So far, our expressions have allowed us to match against strings that have a variable num-
ber of characters, but only if the characters appear in the order specified. For instance,
a
+
b
+
c
+
can match any number of a’s, b’s, and c’s, but only if they appear in that order.
What if we want to match any string that contains any number of a’s, b’s, and c’s, but
in no particular order? If we want to match any three character sequence that consists
of only a’s, b’s, and c’s, as in
abc
,
acb
,
bca
, and so forth, we will need an additional
metacharacter.
The
[ ]
metacharacters, often referred to as brackets, straight brackets, or braces, allow
us to specify a list of options. The list indicates that the
next
character in the string can
match
any single character
in the list.
Inside of the brackets we can specify characters using one of three notations: an enu-
merated list as in
[abcd]
, a range as in
[a-d]
, or a
class
such as
[[:alpha:]]
. The class
:alpha:
represents all alphabetic characters. Obviously we would not use :alpha: if we
only wanted to match against a subset of letters like a-d. Alternatively, we could use a-zA-
Z to indicate all letters. Notice when describing a class, we use double brackets instead of
single brackets.
Consider the regular expression
[abc][abc][abc]
. This expression will match any
string that contains three consecutive characters that are a’s, b’s, or c’s in any combination.
This expression will match abc, acb, and bca. And, because we are not restricting the num-
ber of times any character appears, it will also match aaa, bbb, aab, aca, and so forth. We
could also use a range to define this expression, as in
[a-c][a-c][a-c]
.
We can combine
[ ]
with
*,
+
,
and
?
to control the number of times we expect the
characters to appear. For instance,
[abc]
+
will match any string that contains a sequence

208
◾
Linux with Operating System Concepts
of 1 or more characters in the set a, b, c while
[abc]*
will also match the empty string. In
this latter case, we actually have a regular expression that will match anything because any
string can contain 0 a’s, b’s, and c’s. For instance,
12345
contains no a’s, b’s, or c’s, and so
it can match
[abc]*
when
*
is interpreted as 0.
Now we have a means of expressing a regular expression where order is not impor-
tant. The expression
[abc]
+
will match any of these four strings that we saw earlier that
matched
a*b*c*
:
•
aaaabbbbcccc
•
abcccc
•
accccc
•
aaaaaabbbbbb
This expression will also match strings like the following.
•
abcabcabcabc
•
abacab
•
aaaaaccccc
•
a
•
cccccbbbbbbaaaa
We can combine any characters in the brackets as in
[abcxyz]
,
[abcd1234],
or
[abcdABCD]
. If we have a number of characters to enumerate, a range is more practical.
We would certainly prefer to use a range like
[a-z]
than to list all of the letters. We can also
combine ranges and enumerations. For instance, the three sequences above could also be
written as
[a-cx-z]
,
[a-d1-4],
and
[a-dA-D]
respectively. Now consider the list of all
lower case consonants. We could enumerate them all as
[bcdfghjklmnpqrstvwxyz]
or we could use several ranges as in
[b-df-hj-np-tv-z]
.
While we can use ranges for letters and digits, there is no range available for the punc-
tuation marks. You could enumerate all of the punctuation marks in brackets to capture
“any punctuation mark” but this would be tedious. Instead, we also have a class named
:punct:
which is applied in double brackets, as in
[[:punct:]]
. Table 6.2 provides a
listing of the classes available in Linux.
Let us now combine all of the metacharacters we have learned with some exam-
ples. We want to find a string that consists only of letters. We can use
^[a-zA-Z]
+
$
or
^[[:alpha:]]
+
$
. The ^ and $ force the regex to match an entire string. Thus, any string
that contains nonletters will not match. If we had used only
[a-zA-Z]
+
, then it could
match any string that contains letters but could also have other characters that precede or
succeed the letters such as
abc123
,
123abc
,
abc!def,
as well as
^#!$a*%&
. Why do we
use the
+
in this regex? If we had used
*
, this could also match the empty string, that is,

Regular Expressions
◾
209
a string with no characters. The
+
insists that there be at least one letter and the
^
and
$
insist that the only characters found are letters.
We could similarly match a string of only binary digits. Binary digits are 0 and 1. So
instead of [a-zA-Z] or [[:alpha:]], we use [01]. The regex is
^[01]
+
$
. Again, we use the
^
and
$
to force the expression to match entire strings and we use
+
instead of * to disallow the
empty string. If we wanted to match strings that comprised solely digits, but any digits, we
would use either
^[0-9]
+
$
or
^[[:digit:]]
+
$
.
If we want to match a string of only punctuation marks, we would use
^[[:punct:]]
+
$
.
Unlike the previous examples, we would not use […] and enumerate the list of punctuation
marks. Why not? There are too many and we might (carelessly) miss some. There is no range
to indicate all punctuation marks, such as [!-?], so we must either list them all, or use :punct:.
If we want to match a string that consists only of digits and letters where the digits precede
the letters, we would use
^[0-9]
+
[[:alpha:]]
+
$
. If we wanted to match a string that
consists only of letters and digits where the first character must be a letter and then can be
followed by any (0 or more) letters and digits, we would use
^[[:alpha:]][0-9a-zA-Z]*$
.
6.2.5 Matching Characters That Must Not Appear
In some cases, you will have to express a pattern that seeks to match a string that does not
contain specific character(s). We might want to match a string that has no blank spaces
in it. You might think to use
[. . .]
+
where the
. . .
is “all characters except the blank
space.” That would require enumerating quite a list as it would have to include every letter,
every digit, and every punctuation mark. In such a case, we would prefer to indicate “no
space” by using the notation
[^ ]
. The
^
, when used inside of
[]
means “do not match”
against the characters listed in the brackets. The blank space after
^
indicates that the only
character we do not want to match against is the blank.
Unfortunately, our regex
[^ ]
will have the same flaw as earlier expressions in that if it
locates any single nonblank character within the string, it is a match to the string. If our
string is “hi there,” the
[^ ]
regex will match the ‘h’ at the beginning of the string because it
TABLE 6.2
Classes Defined for Regular Expressions

Download 5,65 Mb.

Do'stlaringiz bilan baham:

1 ... 77 78 79 80 81 82 83 84 ... 254