200
◾
Linux with
Operating System Concepts
28. Explain why you might want to use ping. Explain why you might want to use tracer-
oute instead of ping.
29. Explain why you might want to use nslookup.
30. How does nslookup differ from dig and host?
31. What is the equivalent in host of the command dig (i.e., the version of host that pro-
vides detailed information like dig)?
32. How would you specify using dig that you only want to obtain information for the
listings of NSs?
33. What is an SOA?
201
C h a p t e r
6
Regular Expressions
T
his chapter’s
learning objectives are
• To understand the use of regular expressions
• To understand the usage of the metacharacters of regular expressions
• To understand how to apply each of the metacharacters
• To be able to use grep/egrep to
search for files for strings
• To be able to use the basic forms of sed and awk to solve search problems
6.1 INTRODUCTION
A regular expression (regex) is a string that expresses a pattern used to match against other
strings. The pattern will either match some portion of another string or not. We can use
regular expressions to define pattern matching rules which can then be used by programs.
For instance, regular expressions are applied in software that
performs spam filtering and
natural language understanding.
Let us look at a concrete example to illustrate the problem that a regular expression can
solve. You want to build a spam filter. You need to come up with a way to filter out any
email message that contains the word Viagra. Searching for the literal string “Viagra” or
“viagra” is easy enough. But, clever spammers will try to disguise the word by using sub-
stitution letters, adding characters or removing letters in the word.
We might expect any
of the following variations:
• v.i.a.g.r.a
• v1agra
• vi_ag_ra
• vi@gr@
202
◾
Linux with Operating System Concepts
• ViAgRa
• Viagr
To build our spam filter, we would not want to list every possible potential appearance of
Viagra. For
the six examples listed here, there are thousands of others that someone might
come up with. Instead, we could define a single regular expression that could cover many
or most of the possibilities.
Let us consider another example. We want to define a regular expression to match any
person’s full name (we will assume just first and last name). A
name starts with a capital
letter and is followed by lower-case letters. This defines any capitalized word. To differenti-
ate, a name will consist of two of these with a space in between. We can define this regex as
consisting of an upper-case letter followed by some number of lower-case letters followed
by a space followed by a capital letter followed by some lower-case letters. Note that our
regex does not ensure that the names found are actually people’s names.
With the regular
expression defined, we can use a Linux program, for instance grep, to scan through a file
to locate all of the strings that are names.
To
define a regular expression, you write a pattern that consists of
Do'stlaringiz bilan baham: