Linux with Operating System Concepts



Download 5,65 Mb.
Pdf ko'rish
bet92/254
Sana22.07.2022
Hajmi5,65 Mb.
#840170
1   ...   88   89   90   91   92   93   94   95   ...   254
Bog'liq
Linux-with-Operating-System-Concepts-Fox-Richard-CRC-Press-2014

pattern
/ {print . . .}’ 
filename
The pattern is a string, regular expression, or a condition. The action, in this case, is a 
print statement which specifies what to print from the matching line. The 

would need 
to be replaced with the content to output. Given such a command, awk examines filename 
line-by-line looking for 
pattern
.
awk considers each line to be a series of data separated into fields. Each field is denoted by 
$
n
where 
n
is the column of that field where the leftmost (first) field of a line is $1. The nota-
tion 
$0
is reserved to indicate “all fields in the line” which we could use in the print statement.
Let us reconsider our file from Section 6.5 names.txt which contained first names, mid-
dle initials, and last names. We want to print out the full names of those with middle ini-
tials. We could use /
[A-Z]\./
for the pattern and 
{print $0}
for the action (assuming 
that each line only contains names, otherwise $0 would give us the full line which might 
be more information than we want to output). If we want to ensure that only the names are 
output, and assuming that the names appear in the first three fields of each line, we could 
use 
{print $1 $2 $3}
. The full awk command is as follows.
awk ‘/[A-Z]\./ {print $1 $2 $3}’ names.txt
awk matches entries
on these lines
awk operates on values
in these fields
FIGURE 6.4 
awk operates on fields within rows of a file.


232

Linux with Operating System Concepts
Here, we see that $1 represents the first field (first name) of the row, $2 represents the 
second field (middle initial) of the row, and $3 represents the third field (last name) of the 
row. If we do not want to output the middle initial, we could use the following command.
awk ‘/[A-Z]\./ {print $1 $3}’ names.txt
What is a field? Fields are indicated by a 
delimiter
(a separator). For awk, delimiters are 
either spaces or tabs (indicated with 
\t
). Therefore, whether a file is clearly in a tabular 
form or just text with spaces, awk is able to be used on it. However, if it is a textfile, we may 
not know the exact number of items (fields) per line as the number of items (words) can 
vary line-by-line.
The simple awk structure and example above only specify a single pattern. As with sed, 
you are able to specify any number of patterns for awk, each with its own action. The struc-
ture of a more elaborate awk command is
awk ‘/pattern
1
/ {action
1
}
/pattern
2
/ {action
2
}
/pattern
3
/ {action
3
}

/pattern
n
/ {action
n
}’ 
filename
This instruction is interpreted much like a nested if-then-else statement in a program-
ming language. Working line-by-line in 
filename
, if pattern
1
matches, execute action
1
, else 
if pattern
2
matches, execute action
2
, else if …, else if pattern
n
matches, execute action
n
. As 
soon as a pattern matches and the corresponding action is executed, awk moves on to the 
next line.
Let us consider some more interesting uses of awk. Assume we have a textfile, sales.dat, 
which contains sales information. The file consists of rows of sales information using the 
following fields:
Month 
Salesman Sales 
Commission amount 
Region
Jan 
Zappa 
3851 
.15 
CA, 
OR, 
AZ
Aside from the first row, which is the file’s header, each entry is of sales information for a 
given employee. There may be multiple rows of the same month and/or the same salesman. 
For instance, another row might contain
Feb 
Zappa 
6781 
.20 
CA, OR, WA
First, let us compute the salary earned for each salesman whose region includes AZ. 
Here, we have a single pattern, a regular expression to match AZ. In fact, since there is 
no variability in the regular expression, our pattern is literally the two characters “AZ”. 
For each matching line, we have a salesman who worked the Arizona region. To compute 
the salary earned, we need to multiply the sales by the commission amount. These two 


Regular Expressions

233
amounts are fields 3 and 4 respectively. We want to print these values. Our awk command 
could be
awk ‘/AZ/ {print $3*$4}’ sales.txt
Unfortunately, this will only provide us with a list of sales values, but not who earned 
them or in which month. We should instead have a more expressive output using 
{print 
$1 $2 $3*$4}
. This would give us 
JanZappa577.65
! We need to explain to awk how 
to format the output. There are two general choices for formatting. First, separate the 
fields with commas, which forces awk to output each field separated by a space. Second, 
between each field, we can use “
\t
” or “ “ to indicate that a tab or a blank space should 
be output.
Next, let us compute the total amount earned for Zappa. The awk command
awk ‘/Zappa/ {print $1 “\t” $3*$4}’ sales.txt
will provide one output (line) for each Zappa entry in the file. This will not give us a grand 
total, merely all of Zappa’s monthly sales results. What we need to do is accumulate each 
value in some running total. Fortunately, awk allows us to define and use variables. Let us 
use a variable named 
total
. Our action will now be 
{total 
=
total 
+
$3*$4}
. We can 
also print out each month’s result if we wish, so we could use 
{print $1 “\t” $3*$4; 
total 
=
total 
+
$3*$4;}
or if we want to be more efficient
{temp 
=
$3*$4; print 
$1 “\t” temp; total 
=
total 
+
temp}
. Our new awk command is
awk ‘/Zappa/ {temp 
=
$3*$4; print $1 “\t” temp;
total 
=
total 
+
temp}’ sales.txt
Notice that we are not outputting total in the above awk statement. We will come back 
to this in the next subsection.
While awk is very useful in pulling out information from a file and performing compu-
tations, we can also use awk to provide specific results from a Linux command. We would 
do this by piping the result of an instruction to an awk statement. Let us consider a couple 
of simple examples.
Let us output the permissions and filenames of all files in a directory. The 
ls –l
long 
listing will provide 10 characters that display the item’s file type and permissions. This first 
character should be a hyphen to indicate a file. If we have a match, we then want to output 
the first and last entries on the line ($1 and $9). This can be accomplished as follows.
ls –l | awk ‘/^-/ {print $1, $9}’
Notice that the awk instruction does not have a filename after it because its input is com-
ing from the long listing. The regular expression used as our pattern, 
^-
, means that the 
line starts with a hyphen.


234

Linux with Operating System Concepts
In another example, we want to obtain process information using ps of all running bash 
shells. This solution is even easier because our regex is simply 
bash
. We print $0 to output 
the full line including for instance the PID and statistics about each bash shell’s processor 
usage.
ps aux | awk ‘/bash/ {print $0}’
6.6.2 BEGIN and END Sections
Our earlier example of computing Zappa’s total earnings computed his total pay but did 
not print it out. We could change our action to be 
{temp 
=
$3*$4; print $1 “\t” 
temp; total 
=
total 
+
temp; print total}
. This would then explicitly output 
the value of temp for each match. But this will have the unfortunate effect of outputting 
the total for every row in which Zappa appears; in addition, the total will increase with 
each of these outputs. What we want to do is hold off on printing total until the very end 
of awk’s run.
Fortunately, awk does have this capability. We can enhance the awk command to 
include a 
BEGIN
section and/or an 
END
section. The 
BEGIN
section is executed auto-
matically before awk begins to search the file. The 
END
section is executed automatically 
after the search ends. The 
BEGIN
section might be useful to output some header infor-
mation and to initialize variables if necessary. The 
END
section might be useful to wrap 
up the computations (for instance, by computing an average) and output any results. We 
enhance our previous awk instruction to first output a report header and then at the end, 
output the result.
awk ‘BEGIN {print “Sales results for Zappa”; total 
=
0}
/Zappa/ {temp 
=
$3*$4; print $1 “\t” temp;
total 
=
total 
+
temp}
END {print “Zappa’s total sales is $” total}’ sales.txt
The above instruction works as follows. First, the 
BEGIN
statement executes, outputting 
the header (“Sales results for Zappa”) and initializes the variable total to 0. This initializa-
tion is not necessary as, in awk, any variable used is automatically initialized to 0. However, 
initializing all variables is a good habit to get into. Next, awk scans the file line-by-line 
for the pattern 
Zappa
. For each line that matches, temp is set to the values of the third 
and fourth columns multiplied together. Then, awk outputs 
$1
(the name), a tab, and the 
value of temp. Finally, temp is added to the variable total. After completing its scan of the 
file, awk ends by output a closing message of Zappa’s total. Note that if no lines contained 
Zappa, the output would be simply:
Sales results for Zappa
Zappa’s total sales is $0
Now, let us combine the use of the 
BEGIN
and 
END
sections with a multipatterned 
instruction. In this case, let us compute the total salaries for three employees. We want to 


Regular Expressions

235
have, as output, each employee’s total earnings from sale commissions. This will require 
maintaining three different totals, unlike the previous example with just the total for 
Zappa. We will call these variables total1, total2, and total3.
awk ‘BEGIN {total1 
=
0;total2 
=
0;total3 
=
0}
/Zappa/ {total1 
=
total1 
+
$3*$4}
/Duke/ {total2 
=
total2 
+
$3*$4}
/Keneally/ {total3 
=
total3 
+
$3*$4}
END {print “Zappa $” total1 “\n” “Duke $” total2 “\n”
“Keneally $” total3}’ sales.txt
As with our previous examples, the regular expression exactly matches the string we are 
looking for, so it is not a very challenging set of code. However, the logic is slightly more 
involved because we are utilizing three different running totals.
6.6.3 More Complex Conditions
Let us look at an example that requires a greater degree of sophistication with our patterns. 
In this case, let us obtain the number of salesmen who operated in either OH or KY. To 
specify “or,” we use/
pattern1/||/pattern2/
where the notation 
||
means “or.” If we 
have a matching pattern, we want to increment a counter variable. In the END statement
we will want to output this counter’s value. We omit the BEGIN statement because we do 
not need a header in this case (the END statement outputs an explanation of the informa-
tion that the command computed for us and the variable, counter, is automatically initial-
ized to 0).
awk ‘/OH/||/KY/{counter 
=
counter 
+
1;}
END {print “Total number of employees who serve OH or KY: “
counter}’ 
sales.txt
If we wanted to count the number in OH and KY, we would use 
&&
instead of 
||
.
Let us consider a different file, courses.dat, to motivate additional examples. Imagine 
that this file contains a student’s schedule for several semesters. The file contains fields for 
semester (fall, spring, summer, and the year as in fall12 or summer14), the course which is a 
designator and a course number as in CSC 362.001 (this is divided into two separate fields, 
one for designator, one for course number), number of credit hours, location (building, 
room), and time. For instance, one entry might be
fall12 CSC 362.001 3 GH 314 MWF 9:00-10:00 am
Let us create an awk command to output the courses taken in a particular year, for 
instance 2012. We would not want to use the pattern /
12
/ because the “12” could match the 
year, the course number or section number, the classroom number, or the time. Instead, 
we need to ensure that any 12 occurs near the beginning of the line. We could use the 
expression /
fall12/||/spring12/||/summer12/
. A shorter regular expression is one 
that finds 12 in the first field. Since the first field will be a string of letters representing the 


236

Linux with Operating System Concepts
season (fall, spring, summer), we can denote this as 
[a-z]
+
12
. To indicate that this must 
occur at the beginning of the line, we add 
^
to the beginning of the expression. This gives 
us the command
awk ‘/^[a-z]
+
12/{print $0}’ courses.dat
An awk command to output all of the 400-level courses should not just contain the 
pattern /
4
/ nor /
4[0-9][0-9]/
because these could potentially match other things like 
a section number, a classroom number, or in the case of /
4
/, credit hours. Instead, we 
will assume that all course designators are three-letter combinations while all classroom 
buildings are two-letter combinations. Therefore, the course, as indicated as 
4[0-9][0-9]
should follow after 
[A-Z][A-Z][A-Z]
. Since we require three letters in a row, this would 
not match a building. Our awk command then would look like this:
awk ‘/[A-Z][A-Z][A-Z] 4[0-9][0-9]/{print $0}’ courses.dat
We can compute the number of hours earned in any particular semester or year. Let us 
compute the total hours for all of 2012. Again, we will use 
^[a-z]
+
12
to indicate the pat-
tern as we match the “12” only after the season at the beginning of the line. But rather than 
printing out the entry, we want to sum the value of hours, which is the fourth field (
$4
). 
Our awk command will be as follows.
awk ‘/^[a-z]
+
12/ {sum 
=
sum 
+
$4}
END {print “Total hours earned in 2012 is “ sum}’
courses.dat
The metacharacter 
^
is used to denote that the regular expression must match at the 
beginning of the line. To indicate “not,” we use 
!
before the pattern. We would use 
!/

Download 5,65 Mb.

Do'stlaringiz bilan baham:
1   ...   88   89   90   91   92   93   94   95   ...   254




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish