Partial Result of egrep Searching for IP Addresses.
Regular Expressions
◾
221
A regular expression that will match a string without a period is not as simple as
[^\.]
because this only says “match if any portion of the string does not contain a period.” Thus,
the expression
[^\.]
will match any string that is not solely periods.
The option –m allows you to specify a number so that grep stops searching after it
reaches that number of matches. This could be useful if you are only looking for a few or
even one match. For instance, if you want to know if a particular expression matches any-
where in a file, you could use
–m 1
.
If you only want grep to output the actual matches (i.e., the portion of the line that
matched), use –o. Finally, –l (lower case “L”) and –L will output only file names that contain
matches or file names that do not contain matches respectively rather than the matched lines.
6.4.3 Additional egrep Examples
Let us now concentrate on a few examples that search the standard Linux dictionary for
words that fit certain patterns. If you examine the contents of this dictionary (stored in
/usr/share/dict/words), you will see words that are numbers, words that contain num-
bers, words that start with upper-case letters, words that are all lower case, and some
words that are all upper case, all upper-case letters and numbers, and words with punc-
tuation marks.
We start by looking for all 30-letter words.
egrep ‘[[:alpha:]]{30}’ /usr/share/dict/words
The output of this command is two words:
dichlorodiphenyltrichloroethane
pneumonoultramicroscopicsilicovolcanoconiosis
2:loopback 127.0.0.0
3:link-local 169.254.0.0
14:restrict 127.0.0.1
18:#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
26:#broadcast 192.168.1.255 autokey # broadcast server
28:#broadcast 224.0.1.1 autokey
# multicast server
29:#multicastclient 224.0.1.1
# multicast client
30:#manycastserver 239.255.254.254
# manycast server
31:#manycastclient 239.255.254.254 autokey # manycast client
35:#server 127.127.1.0 # local clock
17:host 127.0.0.1
25:#uri ldap://127.0.0.1/
26:#uri ldaps://127.0.0.1/
4:nameserver 172.28.102.11
5:nameserver 172.28.102.13
FIGURE 6.3
Results from Sample Grep Instruction.
222
◾
Linux with Operating System Concepts
Just in looking at this output you can see that they are different length words. They are not
both 30-letters long. In fact the first word is 31 letters and the second is 45. To obtain exactly
30-letter words, we have to match a string from the start of the line to the end. We enhance
our regular expression to be
^[[:alpha:]]{30}$
and find that there are no 30 letter words.
Let us look for 20 letter words that start with a capital letter.
egrep ‘[A-Z][[:alpha:]]{19}’ /usr/share/dict/words
We use 19 in curly brackets because A-Z represents the first letter, so we are now looking
for words that, after the first capital letter, have 19 additional letters. This regular expres-
sion has the same drawback as our previous grep command’s expression in that we did
not enforce that the regex should precisely match the entire line. By adding
^
and
$
to our
regular expression we wind up with exactly five words:
Archaeopterygiformes
Biblicopsychological
Chlamydobacteriaceae
Llanfairpwllgwyngyll
Mediterraneanization
Let us try another one. If you look through the file, you will see many words with various
punctuation marks. Some words are hyphenated, some have periods (e.g., abbreviations), and
some use / as in AC/DC. There are a handful of words that contain other punctuation marks.
We want to find all words that contain punctuation marks aside from hyphens, periods,
and slashes. How can we do this? The regular expression for any punctuation mark is easy,
[[:punct:]]
. Here, we want to limit the expression to not include three forms. We cannot
do so with
:punct:
, so instead, we need to enumerate the remaining punctuation marks.
We will not include
‘
and “ because these have special meanings in grep. We will also not
use \ (the escape character) or [], as we will need to enumerate our list of punctuation marks
inside of
[]
. This leaves us with [`~!@#$%^&*()_
=
+
{}|;:
<
>
,?]. Our grep command is
egrep ‘[`~!@#$%^&*()_
=
+
{}|;:
<
>
,?]’ /usr/share/dict/words
which results in the following list.
2,4,5-t
2,4-d
A&M
A&P
AT&T
&c
hee-hee!
he-he!
IT&T
Regular Expressions
◾
223
R&D
USC&GS
Can we simplify the above regex? Yes, we examine how to do so below.
To this point, we have used grep to search files for instances of strings. By piping the results
of one command to grep, we can also reduce the output of other commands. Two common
uses are to pipe the result of
ls –l
to
grep
and
ps
to
grep
. You might recall from earlier
chapters that
ps ax
will list all processes irrelevant of user or terminal window. This creates
a substantial amount of output. This can be viewed by piping the result to less, but if we were
interested in only particular processes, we could also pipe the result to grep.
Consider wanting to view active processes that have utilized CPU time beyond 0:00.
First, we want to issue
ps ax
. From this list, we want to find all processes whose Time is
not 0:00. We will search using
egrep ‘0:00’
and then invert the match using –v. Our
command is
ps ax | egrep –v ‘0:00’
Notice that the grep command does not include a filename. Why not? Because the input
to grep is being redirected from a previous command.
Similarly, imagine that you want to view any files in /etc that are readable and writable.
We want to perform
ls –l /etc
. Now, we pipe the result to grep. What regular expres-
sion do we seek? First, we want files, not directories. So we expect the permissions to start
with a hyphen. Next, we want files that are both readable and writable. Thus, the first three
characters in the long listing are expected to be
‘–rw.’
We want to ensure that these are
the first three characters of the line, so we add
‘^’
to the beginning of the regular expres-
sion. This leaves us with the command:
ls –l /etc | egrep ‘^-rw’
Another example is to find the files in /etc that are not both owned by root and in
root’s private group. That is, which files do not have ‘root root’ in their long listing? For
this, we want to specify “anything but
‘root root’,
” so we will again use –v. Our
instruction is
ls –l /etc | egrep –v ‘root root’
Let us reconsider our earlier solution to list all words in the Linux dictionary that con-
tained punctuation marks other than the hyphen, period, and slash. We can base this on
the above strategy of piping the result of a Linux instruction to egrep –v. In this case, we
will
use egrep ‘[[:punct:]]’ /usr/share/dict/words
to obtain all words that
contain a punctuation mark. Now we pipe the result to
egrep –v ‘[./-]’
to rule out all
of those that contain one of the hyphen, period, and slash. The entire statement is
egrep ‘[[:punct:]]’ /usr/share/dict/words | egrep –v ‘[./-]’
224
◾
Linux with Operating System Concepts
You might wonder about the order that the three punctuation marks are placed in the
brackets. The hyphen, when used in brackets, indicates a range. We have to make sure that
the hyphen is at the end of anything listed in brackets when we want to indicate that the
hyphen is a character to match and not part of a range. The period and slash can be placed
in either order.
6.4.4 A Word of Caution: Use Single Quote Marks
To wrap up this section, let us revisit the decision to place our regular expressions in quote
marks as we have done throughout this section when using grep. As it turns out, in most
cases, those quotes are not needed. For instance,
ls –l /etc | egrep ^-rw
will
work as well as the instruction listed earlier. But the instruction
ls –l /etc | egrep
–v root root
does not work correctly. The quotes are needed here because of the blank
space between
root
and
root
. So our command should be
ls –l /etc | egrep –v ‘root root’
But there is another reason why we need to enclose the regular expression in quote
marks. This reason has to do with the Bash interpreter and filename expansion, or glob-
bing. Recall that
ls *
will list all items in the current directory. Similarly, an instruction
like wc * will apply the wc operation to all files in the current directory. The Bash inter-
preter performs filename expansion before the instruction is executed. So
wc *
is first
converted into
wc
Do'stlaringiz bilan baham: |