606
◾
Linux with Operating System Concepts
solution to this problem; however, educating the employees can limit such a problem.
Additionally, monitoring access to sensitive files can help track employee movements. As
stated earlier in the chapter, you can obtain such information through log files, particularly
the audit logs.
Finally, there are physical threats to IT equipment. You
might have theft or vandal-
ism. This is more common in settings where computers are available to the public (e.g., a
library) or in schools where computers may not be monitored. Solutions are to monitor
computer usage through lab monitors (personnel) or cameras and to lock equipment to the
desks. Physical threats also arise because of fire, flood, smoke, and heat damage.
One solution to a fire is to use sprinklers. However, water can damage the equipment
as quickly as the fire! Fire-retardant chemicals are sometimes used in place of sprinklers
as they may cause less damage to hardware although they tend to be far more expensive.
Smoke and heat will have less of an impact on computer hardware but a sufficient amount
of smoke or a hot enough environment can still lead to damage.
In addition, electrical power surges can destroy computer chips. A simple solution is to
make sure that all electronic equipment is plugged in through surge protectors.
To recover from disaster, a plan must specify how to proceed from the point that the
disaster ends. Such a plan must include how to restore data
from backups and how to
replace damaged or destroyed equipment. The plan must clearly map out how to proceed
so that the organization’s IT can reach full capability. The plan will no doubt have differ-
ent steps based on the type of and degree of disaster. As there are any number of possible
disasters that you might face, you will have different plan steps per disaster. Each disaster
plan will have its own team with a selected team leader. The personnel that make up the
team might overlap (for instance, we might expect IT personnel to be involved in all plans
and perhaps the same person).
Let us consider a partial plan. First,
to implement the plan, we need to make sure that
we have information available to enact the plan steps. We should keep electronic and hard-
copy versions of variety of information at multiple sites. The information should include at
a minimum the following:
• Contact information for our personnel including home addresses and phone num-
bers, especially for our emergency response team(s).
• A full inventory of our IT infrastructure (servers, workstations, network compo-
nents, software including versions installed, licenses, and description of our data).
• A copy of our plan.
Now,
if you have a disaster, you follow your plan.
Let us consider as an example a fire that results in damage to the building housing some
of the organization’s employees, data-processing equipment, and storage. This is only one
site though and other sites have overlapping employees, equipment, and storage.
The first step is to make sure the building has been evacuated and that the fire depart-
ment is responding. Next, determine if the disaster is real. If not, return to normal business
Maintaining and Troubleshooting Linux
◾
607
and possibly cancel the call to the fire department (this is optional
as there may be a good
reason to have the fire department investigate the cause of an alarm). Otherwise, initiate
the fire disaster-recovery plan by contacting the fire disaster team’s leaders and members.
Now, the team takes over.
Their plan should include contacting other sites of the organization to let them know
their status. These other sites should expect a heavier load and to handle calls and emails
intended for the site affected. Additionally, if this site has data that are not securely backed
up elsewhere, the team needs to work out how to recover that data or what to do in the
event of lost data.
The fire itself would presumably be put out within an hour or two. The next step is to
assess the damage. The team would not be permitted on the premises until the fire inspec-
tor gives an approval.
The team, in conjunction with the fire inspector, could determine,
for instance, if the building will be safe to work in or will require extensive repair. This
would provide management with an estimate for how long the organization would have to
go without access to the building. If the team is allowed to move through the building, they
might collect items that would help recover from the disaster such as any storage media
that might have been held in a secure location (e.g., safe) or was not damaged by the fire.
Now, it is about recovery. The team will work to ensure that the off-site centers are han-
dling the load and that there was no loss of data. A press release would be placed with the
media and/or directly to clientele to indicate any ongoing problems
such as a limitation to
processing or a reduction in website availability. If the building requires more than a few
days’ worth of repairs, then an alternate site should be established during this period.
The team will also explore whether new equipment must be purchased to replace any-
thing damaged. This new equipment may have to be housed separately during building
repair. Once the building is available, any new equipment must be moved. Finally, the
disaster team, along with management should explore their plan and update it based on
any failures that may have been identified during the disaster.
Once disaster planning is complete, the organization should review its plans, perhaps
annually. New threats may arise as the organization changes its assets.
Over time, threats
may change as the organization implements security schemes to defeat old threats.
You might wonder in this section why this topic is included in a text on Linux. As a sys-
tem administrator you may not be responsible for disaster planning and recovery, but you
should certainly be knowledgeable about the process as management might (and should)
seek your input. Without IT staff’s input, any disaster-recovery plan that covers IT will no
doubt be lacking.
14.7 TROUBLESHOOTING
You now have the tools available to troubleshoot your system. Which tools do you apply
and when? This section will examine a number of technical problems and discuss system
administration efforts to resolve these problems.
For each problem, we look at steps to further identify the cause of the problem followed
by easy or short-term solutions and then more involved or long-term solutions. These solu-
tions described in these scenarios are not intended to be complete
but should illustrate
608
◾
Linux with Operating System Concepts
some of the types of efforts that you, the system administrator, should take when faced
with similar problems.
Do'stlaringiz bilan baham: