Linux with Operating System Concepts



Download 5,65 Mb.
Pdf ko'rish
bet83/254
Sana22.07.2022
Hajmi5,65 Mb.
#840170
1   ...   79   80   81   82   83   84   85   86   ...   254
Bog'liq
Linux-with-Operating-System-Concepts-Fox-Richard-CRC-Press-2014

n
}
exactly n occurrences
• 
{
n,m
} – between n and m occurrences where n 
<
m
• 
{
n,
} – at least n occurrences
For instance, 
[0-9]{5}
will match exactly five digits while 
[0-9]{5,}
will match five or 
more digits. The notation 
[0-9]{1,5}
will match anywhere from one to five digits.
Keep in mind that the regular expression can match any substring of a string. Consider 
the following string:
1234567890abc
The regular expression 
[0-9]{5}
will match this string because there is a substring 
that contains exactly five digits. In fact, 
[0-9]{5,}
will also match this string. To define a 
regular expression that contains exactly five consecutive digits, we would have to include 
^
and 
$
. However, 
^[0-9]{5}$
will 
only
match a string that is exactly five digits long. 
It would not match the string above. If we wish to match strings that contain five con-
secutive digits but can have other, nondigit, characters, we have to enhance our regular 
expression.
The expression 
[^0-9]*[0-9]{5}[^0-9]*
says “match a string that contains of 0 or more 
nondigits followed by five digits followed by 0 or more nondigits.” This regular expression 
will match any of the following strings:
• abc12345abc
• 12345abc


212

Linux with Operating System Concepts
• abc12345
• 12345
The expression would not match 
1234567
or 
abc123def
because neither of these strings 
contains exactly five digits in sequence. It would match 
1ab23456de7
though because the 
five digits are surrounded by nondigits. How does 
^[^0-9]*[0-9]{5}[^0-9]*$
differ?
6.2.8 Selecting between Sequences
Now that we can control the exact number of repetitions that we expect to see, let us define 
a regular expression to match a zip code. If we consider a five-digit zip code, we can use 
[0-9]{5}
. If we want to match a five-digit zip code that is in a string that contains other 
characters, we might use the previously defined expression 
^[^0-9]*[0-9]{5}[^0-9]*$

In fact, if we know that the zip code will always follow a two-letter state abbreviation fol-
lowed by a blank space, we could be more precise, as in 
[A-Z]{2} [0-9]{5}[^0-9]*$
.
We also have nine-digit zip codes. These are zip codes that consist of five digits, a 
hyphen, and four digits, as in 12345-6789. We would define this sequence as 
[0-9]{5}-
[0-4]{4}
. Now we have a new problem. Which expression should we specify? If we spec-
ify both, as in
[0-9]{5} [0-5]{5}-[0-9]{4}
we are stating that the string must have a five-digit sequence followed by a space followed by a 
five-digit sequence, a hyphen, and a four-digit sequence. We need to be able to say “or.” Recall 
from earlier that we were expressing “or” using 
[list]
. However, the items in 
[]
indicated 
that any single character should match, not that we want to match an entire sequence.
We use another metacharacter to denote “or” in the sense that we have two or more 
expressions and we want to match either (any) expression against a string. This metacha-
racter is 
|
(the vertical bar). Now we can express a zip code using the following.
[0-9]{5}|[0-5]{5}-[0-9]{4}
In the above expression, the | appears between the two definitions: the five-digit zip code 
and the nine-digit zip code. That is, the regex will match a five-digit number OR a five-digit 
number followed by a hyphen followed by a four-digit number.
Let us consider another example. The Cincinnati metropolitan region extends into three 
states, Ohio, Kentucky, and Indiana. If we want to define a regular expression that will 
match any of these three states’ abbreviations, our first idea might be to express this as 
[IKO][NYH]
. This will match any of IN, KY, and OH, so it seems to solve the problem. 
However, there is no way to control the ideas that “if the first character in the first list 
matches, then only use the first character in the second list.” So this expression could also 
match any of 
IY

IH

KN

KH

ON,
or 
OY
. By using the | we can avoid this problem through 
IN|KY|OH
.


Regular Expressions

213
The final metacharacters are the parentheses, 
()
. These are used when you want to 
encapsulate an entire pattern of metacharacters, literal characters, and enumerated lists 
inside another set of metacharacters. This allows you to state that the trailing metacharac-
ter applies to the entire pattern rather than the single preceding character.
For instance, we want to match against a list of words. Words will consist of either 
capitalized words or lower case words and will be separated by spaces. A single word is 
indicated using 
[A-Za-z][a-z]*
, that is, any upper or lower-case letter followed by 0 or 
more lower-case letters. To express the blank space, we will follow the above pattern with 
a blank space, giving us ‘
[A-Za-z][a-z]* 
’. The quote marks are shown to clearly indicate 
the blank space. Now, to express that there are several of these words in the string, we will 
want to add 
{2,}
. However, if we place the 
{2,}
after the blank space, it will only modify 
the blank space. We instead want the 
{2,}
to modify the entire expression. Therefore, we 
place the expression in () giving us 
([A-Za-z][a-z]* ){2,}
. Now we have an expression 
that will match two or more sequences of “upper- or lower-case letter followed by 0 or more 
lower-case letters followed by a blank.”
If we expect to see between two and five words in the string, we would express this as
([A-Za-z][a-z]* ){2,5}
To ensure that the two to five words makes up the entire string, we might enclose the 
expression within 
^
and 
$
marks as in
^([A-Za-z][a-z]* ){2,5}$
However, there is a flaw in our expression. We might assume that the final word in the 
string does not end in a blank space. How can we say “two to five words 

Download 5,65 Mb.

Do'stlaringiz bilan baham:
1   ...   79   80   81   82   83   84   85   86   ...   254




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish