Linux with Operating System Concepts



Download 5,65 Mb.
Pdf ko'rish
bet84/254
Sana22.07.2022
Hajmi5,65 Mb.
#840170
1   ...   80   81   82   83   84   85   86   87   ...   254
Bog'liq
Linux-with-Operating-System-Concepts-Fox-Richard-CRC-Press-2014

separated
by blank 
spaces?” The word “separated” means that there is a blank between words, but not before 
the first or after the last. What we need here is to present the last word as a “special case,” 
one that does not include the space. How?
^([A-Za-z][a-z]* ){1,4}[A-Za-z][a-z]*$
Now our expression says “one to four words followed by spaces followed by one addi-
tional word.” Thus, our words are separated by spaces but there is no space before the first 
or after the last words.
6.3 EXAMPLES
In this section, we reinforce the discussion of Section 6.2 by providing numerous examples 
and descriptions.
• 
[0-9]
+
Match if the string contains at least one digit.
• 
^[0-9]
+
Match if the string starts with at least one digit.


214

Linux with Operating System Concepts
• 
^[0-9]+$
Match if the string only consists of digits. The empty string will not match. 
Use 
*
in place of 
+
to also match the empty string.
• 
[b-df-hj-np-tv-z]
+
[aeiou][b-df-hj-np-tv-z]+
Match a string that 
includes at least one vowel that is surrounded by at least one consonant on each side. 
Add 
^
and 
$
around the regex to specify a string that consists exactly of one vowel 
with consonants on either side.
• 
[A-Z]{4,}
Match if the string contains at least four consecutive upper-case letters.
• [A-Z]{4} Although this appears to say “match if the string contains exactly four con-
secutive upper-case letters,” it will in fact match the same as the preceding example 
because we are not forcing any characters surrounding the four upper-case letters to 
be nonupper case characters.
• 
[^A-Z][A-Z]{4}[^A-Z]
Match if the string contains exactly four upper-case letters 
surrounded by other characters. For instance, this would match “
abcDEFGhi
” and 

Hi There FRED, how are you?
” but not “
abcDEFGHijk.
” It will also not 
match “
FRED
” because we are insisting that there be nonupper-case letters around 
the four upper-case letters. We can fix this as shown below.
• [^A-Z][A-Z]{4}[^A-Z]|^[A-Z]{4}$
• 
^$
Match only the empty string (blank lines).
• 

Match a string that contains at least three characters of any type. Add 
^
and 
$
to 
specify a regex that matches a string of exactly three characters.
• 
[Vv].?[Ii1!].?[Aa@].?[Gg9].?[Rr].?[Aa@]
This regex might be used in a spam 
filter to match the word “
Viagra
” along with variations. Notice the .
?
used in the 
regex. This states that between each letter of the word viagra, we can accept another 
character. This could account for such variations as 
Via!gra
or 
V.I.A.G.R.A
. The 
use of 1, !, @, and 9 are there to account for variations where these letters are replaced 
with look-alike characters, for instance @ for a.
• 
([A-Z][[:alpha:]]

)?[A-Z][[:alpha:]]+, [A-Z]{2} [0-9]{5}$
This regex can 
be used to match the city/state/zip code of a US postal address. First, we expect a city 
name. A city name should appear as an upper-case letter followed by additional let-
ters. The additional letters may include upper-case letters as in McAllen. Some cities 
are two names like Los Angeles. To cover a two-word city, we expect a blank space and 
another name. Notice the 
?
that follows the close parenthesis to indicate that we would 
expect to see this one or zero times. So we either expect a word, a space and a word, or 
just a word. This is followed by a comma and space followed by two upper-case letters 
to denote the state abbreviation and a space and five digits to end the line. We might 
expect two spaces between state and zip code. We need to include an optional space. We 
could use either of 
[]{1,2}
or 
[][]?
. We can also add 
(-[0-9]{4})?
to indicate that 
the four-digit zip code extension is optional.


Regular Expressions

215
• 
[A-Za-z_][A-Za-z_0-9]*
In most programming languages, a variable’s name is 
a collection of letters, digits, and underscores. The variable name must start with a 
letter or underscore, not a digit. In some languages, variable names can also con-
tain a dollar sign, so we can enhance our regex by adding the 
$
character in the 
second set of brackets. In some languages, variable names are restricted in length. 
For instance, we might restrict variables to 32 or fewer characters. To denote this, we 
can replace the 
*
with 
{0,31}
. We use 31 instead of 32 because we already have one 
character specified. Unfortunately, this would not prevent our regex from matching 
a 33-character variable name because we are not specifying that the regex not match 
33 characters. We could resolve this by placing delimiters around the regex. There are 
a number of delimiters such as spaces, commas, semicolons, arithmetic symbols, and 
parentheses. Instead, we could also state that before and after the variable name we 
would not expect additional letters, digits, or underscores. So we could improve our 
regex above to be
• [^A-Za-z_0-9][A-Za-z_][A-Za-z_0-9]{0,31}[^A-Za-z_0-9]
• 
([(][0-9]{3}[)] )?[0-9]{3}-[0-9]{4}
In this expression, we describe a US phone 
number. A phone number consists of three digits, a hyphen, and four digits. If the 
number is long distance, we include the area code before the number. The area code is 
three digits enclosed in parentheses. For instance, a phone number can be 
555-5555
or 
(123) 555-5555
. If the area code is included, after the close paren is a space. 
In the above regex, the two parens are placed in 
[]
to differentiate the literal paren 
from the metacharacter paren as used to indicate a sequence. We could have also 
used 
\( 
and 
\)
. Some people will write the 10-digit phone number (with area code) 
without the parens. We can include this by adding a 
?
after each paren as in 
[(]?
and 
[)]?
however, this would cause the regex to match if only one of the two parens are 
supplied as in 
(123 555-5555
. Alternatively, we can provide three different ver-
sions of the regex with an OR between them as in
• [(][0-9]{3}[)] [0-9]{3}-[0-9]{4}|[0-9]{3} [0-9]{3}-[0-9]{4}|[0-9]{3}-[0-9]{4}
• 
[0-9]+(.[0-9]
+
)?
This regex will match a numeric value with or without a decimal 
point. We assume that there must be at least one digit and if there is a decimal point, 
there must be at least one digit to the right of the decimal point. By placing the ? after 
the sequence of period and a digit, we are saying that if one appears the other must 
appear. This allows us to have 
99.99
or 
0.0
but not 
0.
with nothing after the decimal 
point.
• 
$[0-9]+\.[0-9]{2}
Here, the 
$
indicates that we seek a dollar sign and not “end 
of string.” This is followed by some number of digits, a period, and two digits. This 
makes up a dollar amount as in 
$123.45
. We have three problems with this regex 
if we want to match any dollar amount. First, we are discounting a dollar amount 
that has no cents such as 
$123
. Second, the regex would not prevent a match against 
something like 
$123.45678
. We would not expect to see more than two digits after 


216

Linux with Operating System Concepts
the decimal point but our regex does not prevent this. Finally, if the dollar amount 
contains commas, our regex will not match. To resolve the first problem, we provide 
two versions:
• 
$([0-9]+|[0-9]+\.[0-9]{2})
Now we can match either a dollar sign and digits or a 
dollar sign, digits, a period, and two digits. To resolve the second problem, we have to 
embed our regex such that it is not followed by digits.
• 
$([0-9]+|[0-9]+\.[0-9]{2})[^0-9]
This in itself would require that a dollar 
amount not end a line. So we could enhance it as follows:
• 
$([0-9]+|[0-9]+\.[0-9]{2})[^0-9]|$([0-9]+|[0-9]+\.[0-9]{2})$
Although 
this looks a bit bizarre with two dollar signs around the latter portion of the regex, 
the first is treated literally and the last means “end of string.” The commas can be 
more of a challenge and this is left to an end of chapter exercise.
Now that we have introduced regular expressions (and no doubt confused the reader), 
we will examine in the next three sections how to use regular expressions in three common 
pieces of Linux software, grep, sed, and awk. Numerous examples will be offered which 
will hopefully help the reader understand regular expressions.
6.4 GREP
The name grep comes from 

Download 5,65 Mb.

Do'stlaringiz bilan baham:
1   ...   80   81   82   83   84   85   86   87   ...   254




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish