pat-
tern
/ to match if pattern does not occur on the line. For instance
awk ‘!/^fall/{print $0}’ courses.dat
would output all of the lines that do not start with “fall.”
Aside from matching a /
pattern
/
, awk allows for comparisons of field values against
other values. These comparisons use the relational operators (
<
,
>
,
=
=
, !
=
,
<
=
,
>
=
). As
with the print statement, you reference a field’s value using
$
n
where
n
is the field number.
For instance, to print all of the courses that are 3 credit hours from your courses.txt file,
you would use
awk ‘$4
==
3 {print $0}’ courses.txt
In this instruction, awk examines the file line-by-line comparing each line’s fourth field
to the value 3. On a match, it outputs that entire row. We could have instead written a pat-
tern to find entries with a 3 in the fourth field. For instance, the credit hour will always
Regular Expressions
◾
237
appear with a space before and after it, unlike any other occurrence of a number, so the
pattern /3/ would have also worked.
While regex-based patterns are very expressive, there are limitations to what you can do,
at least easily. So the conditions that utilize relational operators are often easier. Returning
to obtaining a list of all 400-level courses, we could more easily accomplish this with the
following awk statement.
awk ‘$3
>
=
400 {print $0}’ courses.dat
Notice that we are comparing a field that includes a decimal point against an integer
number.
Let us consider a harder, more elaborate example. We have a payroll.dat file that lists the
person’s name, number of hours worked, and hourly wages as three fields on each line. We
could use the following to output the pay of anyone who earned overtime.
awk ‘$2
>
40 {print $1 “\t $” 40 * $3
+
($2 – 40) * $3 * 1.5}’ payroll.dat
This statement compares the hours field ($2) with 40 and if greater, then computes the
pay, including overtime (overtime pay is normal wages for the first 40 hours plus the hours
over 40 times wages times 1.5 for time and a half). Notice that this will
only
output the pay
for people who have earned overtime. We will see below how to output everyone’s pay, no
matter if they worked normal hours or overtime.
You can combine comparisons by using
&&
or
||
. For instance, to find any employee
who worked fewer than 35 hours and earns more than $20 per hour, you might use
awk ‘($2
<
35 && $3
>
20) {print $1}’ payroll.dat
As another example, let us compute and output all of the entries of Zappa when he
worked overtime. This combines two different conditions: the name is Zappa (field 1) and
the hours is greater than 40 (field 2). We could achieve this in two ways, through the com-
bination of a pattern and a condition /
Zappa/ && $2
>
40
or through two conditions
$1
=
=
”Zappa” && $2
>
40
. In either case, we output the full line,
{print $1}
. Here
are the two versions of this awk command.
awk ‘/Zappa/&& $2
>
40 {print $0}’ payroll.dat
awk ‘$1
=
=
”Zappa” && $2
>
40 {print $0}’ payroll.dat
What happens if you want to perform different operations for different patterns? There
are two possible approaches. First, we might have different actions for different patterns.
Second, we might use an if-else statement (covered in the next subsection).
238
◾
Linux with Operating System Concepts
We will want to compute either overtime pay or normal pay. Below is the awk command
for doing this.
awk ‘$2
>
40 {print $1 “\t $” ($2-40)*$3*1.5
+
40*$3}
$2
<=
40 {print $1 “\t $” $2*$3}’ payroll.dat
The first pattern searches for any line where the employee earned overtime pay. The sec-
ond pattern searches for any line where the employee earned normal pay.
Let us enhance our awk command to compute and output the
average
pay that was
earned by the group of employees. We will also output each person’s individual pay. To
compute an average, we need to amass the total pay of all employees and divide it by the
number of employees. To amass the total pay, we add each individual’s pay to the total after
we compute each individual’s pay. We will add two variables, total_pay, and count. We will
also use a third variable, current_pay, to shorten our statements.
We use
BEGIN
to initialize total_pay and count to 0 (recall that we do not need to
initialize variables to 0, but this is good programming form, so we will). We have two pat-
terns, one for normal pay and one for overtime pay. For either match, we compute and store
in current_pay that employee’s pay and add that value to total_pay while also outputting
the result. We also increment count. When we are done, we use an
END
statement to com-
pute the average pay and output the result.
awk ‘BEGIN {total_pay
=
0.0;count
=
0}
$2
>
40 {current_pay
=
($2-40) * $3 * 1.5
+
40 * $3;
total_pay
+=
current_pay; count
++
;
print $1 “\t $” current_pay}
$2
<=
40 {current_pay
=
$2*$3; total_pay
+=
current_pay;
count
++
; print $1 “\t $” current_pay}
END {print “Average pay is $” total_pay/count}’
payroll.dat
Note that count
++
is an increment instruction equivalent to count
=
count
+
1. The
++
notation originated in C and is available in many languages like C, C
++
, perl, and Java.
Now that we have seen how to compute averages in awk, let us consider how to use
this idea where the input is coming from a pipe. Specifically, how can we find the average
file size for a given directory? As we saw earlier when piping
ls –l
to awk, we want to
search for files so our regex is
^-
. In this case, we want to add up the file sizes, which is the
5
th
field in the
ls –l
as well as count the number of files. Thus, we need two variables;
we will use total and count. Finally, in an
END
clause, we will compute and output the
average.
ls -l | awk ‘BEGIN {total
=
0;count
=
0}
/^-/{total
+
=
$5;count
+
+
}
END {print total/count}’
Regular Expressions
◾
239
There is one problem with our instruction. What if none of the items in the current
directory are files? We would wind up with no matches resulting in the computation of
0/0, which would yield an error. We can resolve this problem with additional logic as
described next.
6.6.4 Other Forms of Control
The awk instruction provides a number of other operations making it like a full-fledged
programming language. These operations include control statements like selection state-
ments and loops. The selection statements are an if statement and an if-else statements.
These statements are similar in syntax to those found in Java and C. In Java and C, the
statements look like:
The if statement is: if(condition) statement;
The if-else statement is: if(condition) statement; else statement;
For awk, the entire statement is placed inside of {} marks and each clause (the if state-
ment and the else statement) are placed in {} as well. There is also a nested if-else statement
in Java and C and similarly in awk. In Java and C, the syntax is
if(condition1) statement1; else if(condition2) statement2; else statement 3;
Again, place the entire statement and each clause (statement1, statement2, statement3)
in {}.
The role of the if statement is the same as our previous /
pattern/{action}
pair. We
present a condition. The condition is tested line-by-line and if it is true for a given line, the
associated statement(s) executes on that line. If the statement is an if-else and the condi-
tion is false for a given line, then the else action(s) executes. With the if-else structure, we
do not need any /
pattern/{action}
pairs. Here is an example illustrating our previous
payroll solution.
awk ‘BEGIN {total_pay
=
0.0;count
=
0}
{if ($2
>
40) {current_pay
=
($2-40)*$3*1.5
+
40*$3;
total_pay
+
=
current_pay; count
++
;
print $1 “\t $” current_pay}
else {current_pay
=
$2*$3; total_pay
+=
current_pay;
count
++
; print $1 “\t $” current_pay}
}
END {print “Average pay is $” total_pay/count}’payroll.dat
Here, in between the
BEGIN
and
END
sections is a statement
{if … else …}
. The
statement inside of the
{}
is executed on every line. There is no pattern to match but
instead the if-else statement always executes. The if-else statement has a condition to deter-
mine whether to execute the if clause or the else clause.
240
◾
Linux with Operating System Concepts
We can also solve the earlier problem of computing the average file size of a directory if
the directory contains no files. In this case, we do not want to perform total / count if count
is 0. We add an if-else statement to the END clause.
ls -l | awk ‘BEGIN {total
=
0;count
=
0}
/^-/{total
+=
$5;count
++
}
END
{if(count
>
0) {print total/count}
else print {“no files found!”}’
There are many other features available in awk including loops, arrays, built-in vari-
ables, and input statements to name a few. The use of loops is somewhat limited or irrel-
evant because awk already loops by performing the code between the
BEGIN
and
END
clauses for each line of the file. However, you might find a use for a loop in performing
some computation within one of the actions.
The other features flesh out awk so that it can be used as a command line program-
ming language. That is, features like input, arrays, and loops provide the power to use awk
in place of writing a program. And since awk is a command line program, it allows the
knowledgeable Linux user to write code on the command line that can operate in a similar
way as a program (or shell script).
These two sections will hopefully whet your appetite to learn more about these tools.
The primary reason for covering them in this text was to provide further examples and
uses of regular expressions. As with anything Linux, check out the man pages for sed and
awk. You can also learn a great deal about these programs on the Internet where there are
numerous websites dedicated to sed and awk tutorial and user guides.
6.7 CHAPTER REVIEW
Concepts and terms introduced in this chapter:
• Character class—an abbreviated way to specify a list of all characters within a group-
ing such as alphabetic (:alpha:), upper-case letters (:upper:), digits (:digit:), and punc-
tuation marks (:punct:).
• Character range—an abbreviated way to specify a list of characters that are expressed
as the smallest and largest separated by a hyphen. Ranges are permissible for letters
and digits but not other types of characters.
• Escape character—used to indicate that the given metacharacter should be treated lit-
erally and not as a metacharacter, for instance by forcing the period to act as a period
and not “any character.”
• Enumerated list—specifying all of the options of characters by listing them in [].
• Metacharacters—special characters that are not interpreted literally but are used to
express how preceding character(s) should be matched against.
Regular Expressions
◾
241
• Regular expression—a string comprising literal characters and metacharacters for
use in pattern matching.
• Stream editor—a program that can search for and replace strings without opening
the file in an editor.
• String—any set of characters (including the empty string); we refer to string as the
master string for a regular expression to compare against.
• Substring—any subset of consecutive characters of a string (including the empty
string).
Linux commands covered in this chapter:
• awk—powerful program that searches a file line-by-line for matching patterns and
applies operations to the items in that line. Operations include output and assign-
ment statements containing arithmetic operations.
• grep/egrep—tool to search for matches of a regular expression to every line in one or
multiple files.
• sed—a stream editor to search each line of a file for one or multiple matching strings
of a regular expression and replace all matched items with a replacement string.
REVIEW QUESTIONS
1. What is the difference between
[0-9]
+
and
[0-9]*
?
2. What is the difference between
[0-9]?
and
[0-9]
+
?
3. What is the difference between
[0-9]
and
[^0-9]
?
4. How would .
*
be interpreted?
5. How would .
+
be interpreted?
6. Is there any difference between
[0-9]
and
[[:digit:]]
?
7. We want to match against any sequence of exactly five digits. Why does
[0-9]{5}
not
work correctly?
8. Is there any difference between
[0-9]{1,}
and
[0-9]
+
?
9. How does .
?.?.?.?.?
differ from .
{1,5}
?
10. Interpret
[0-999]
. If we truly wanted to match any number from 0 to 999, how would
we express it correctly?
11. Imagine that we want to match the fractional value of .
10
(i.e., 10%). What is wrong
with the expression .
10
?
12. Write a regular expression to represent two vowels in a row.
242
◾
Linux with Operating System Concepts
13. Write a regular expression to represent two of the same vowel in a row.
14. We want to match against any arithmetic expression of the form
X op Y
=
Z
where
X, Y, and Z are any numbers and op is any of
+
, -, *, or /. Write the proper regular
expression.
15. To find four words in a sentence, we might use
([[:alpha:]]
+
){4}
. Why is this
incorrect? How would you fix it?
For questions 16–19, imagine that we have a file that lists student information, one row
per student. Among the information for each student is every state that the student has
lived in. For instance, we might have one entry with OH and another with OH, MO, NY.
Further, assume that the only use of commas in the entire line will be to separate the states.
16. We want to find every student who has lived in either OH or KY. Write such a regular
expression.
17. Why does [^OK][^HY] match students who have not lived in either OH or KY?
18. We want to find a student who has lived in OH or NY. Why is
[ON][HY]
not proper?
19. We want to find all students who have lived in at least three states. Write such a regu-
lar expression.
For questions 20–23, assume a file contains a list of information about people, row by
row where each row starts with a person’s first name, a comma, and the person’s last name
followed by a colon.
20. Write a regular expression to find anyone whose last name starts with either F, G, H, I,
or J.
21. Write a regular expression to find anyone whose first and last names are both exactly
six letters long.
22. Write a regular expression to find anyone whose first and last names do not contain
an ‘a.’ Both first and last names must not contain an ‘a.’
23. Write a regular expression to find anyone whose last name contains two upper-case
letters as in McCartney. The second upper-case letter can occur anywhere in the name.
For questions 24–30, use the Linux dictionary found in /usr/share/dict/words.
24. Write a regular expression to find words that have two consecutive punctuation marks.
25. Write a regular expression to find any words whose entries contain a digit with letters
on either side of it.
26. Write a regular expression to find all words that begin with the letter a and end with
the letter z.
27. Write a regular expression to find all five letter words that end with a c.
Regular Expressions
◾
243
28. Write a regular expression to find any entries that contain a q followed by a non-u (as
in Iraqi). The q can be upper or lower case.
29. Write a regular expression to find all entries that contain two x’s somewhere in the
word.
30. Write a regular expression to find all words that contain the letters a, b, c, and d in
that order in the word. The letters do not have to appear consecutively but they must
appear in alphabetical order.
31. In this chapter, we saw how to write a dollar amount that could include dollars or dollars
and cents but we did not try to include commas. Provide a regular expression that can
contain any amount from $999,999.99 down to $0.00 with the comma properly included.
32. Following up on number 31, can you come up with a regular expression that permits
any number of digits to the left of the decimal point such that the comma is correctly
inserted?
33. What is wrong with the following grep instruction?
grep abc* foo.txt
34. What is the difference between using –H and –h in grep?
35. Assume we have a grep statement of the form
grep ‘someregex’ *.txt
. Should
we use the option –H? Explain.
36. When using the –c option, will grep output a 0 for a file that contains no matches?
37. Explain the difference between grep ‘[^abc]’ somefile and grep –v ‘[abc]’ somefile.
38. What regular expression metacharacters are not available in grep but are available in
egrep?
39. Write a sed command to remove every line break (new line, \n) in the file somefile.
These should be replaced by blank spaces.
40. Explain what the following sed command will do:
sed ‘s/ //’ somefile
41. Explain what the following sed command will do:
sed ‘s/ //g’ somefile
42. Explain what the following sed command will do:
sed ‘s/[[:digit:]]
+
\([[:digit:]]\)/0/g’ somefile
43. Explain what the following sed command will do:
sed ‘s/^\([[:alpha:]]
+
\)\([[:alpha:]]\)\([[:space:]]\)/
\2\1\3/g’ somefile
244
◾
Linux with Operating System Concepts
44. What does the & represent in a sed command?
45. What is the difference between
\U
and
\u
when used in a sed command?
46. Write a sed command to reverse any capitalized word to start with a lower-case letter
and whose remaining characters are all upper case. For instance, Dog becomes dOG
while cat remains cat.
47. Write a sed command to replace every occurrence of ‘1’ with ‘one,’ ‘2’ with ‘two,’ and
‘3’ with ‘three’ in the file somefile.
For questions 48–53, assume the file payroll.dat contains employee wage information
where each row contains
first_name
last_name
hours
wages
week
where week is a number from 1 to 52 indicating the week of the year. Multiple employees
can occur in the file but they will have different weeks.
48. Write an awk command to output the first and last names of all of all employees who
worked during week 5.
49. Write an awk command to compute and output the total pay for employee Frank
Zappa assuming no overtime.
50. Revise #49 so that the computation includes overtime at 1.5 times the wages specified.
51. Write an awk command to compute the average number of hours worked for the
weeks that Frank Zappa appears.
52. Compute the number of times any employee worked overtime.
53. Compute the average wage of all records in the file.
245
C h a p t e r
7
Shell Scripting
T
his chapter’s learning objectives are
• To be able to write Bash shell scripts
• To know how to use variables and parameters in shell scripts
• To be able to write input and output statements in shell scripts
• To understand and write conditional statements in Bash
• To be able to properly apply selection statements in shell scripts
• To be able to properly apply loop statements in shell scripts
• To understand how to use arrays and string operations in Bash
• To understand the role of the function and be able to write and call functions in Bash
• To understand the differences between Bash and C-shell scripts
7.1 INTRODUCTION
Most computer programs are written in a high-level language (e.g., C
++
, Java) and then
compiled into an executable program. Users run those executable programs. A script is
a program that is
Do'stlaringiz bilan baham: |