Beginning Anomaly Detection Using



Download 26,57 Mb.
Pdf ko'rish
bet42/283
Sana12.07.2021
Hajmi26,57 Mb.
#116397
1   ...   38   39   40   41   42   43   44   45   ...   283
Bog'liq
Beginning Anomaly Detection Using Python-Based Deep Learning

LPSRUW

QXPS\

DV

QS

LPSRUW

SDQGDV

DV

SG

LPSRUW

PDWSORWOLES\SORW

DV

SOW


IURP

VNOHDUQHQVHPEOH



LPSRUW

,VRODWLRQ)RUHVW



IURP

VNOHDUQPRGHOBVHOHFWLRQ



LPSRUW

WUDLQBWHVWBVSOLW



IURP

VNOHDUQSUHSURFHVVLQJ



LPSRUW

/DEHO(QFRGHU



PDWSORWOLELQOLQH



Figure 2-10.  Importing numpy, pandas, matplotlib.pyplot, and sklearn modules

Chapter 2   traditional Methods of anoMaly deteCtion




39

Each data entry is massive, with 42 columns of data per entry. The exact name 

doesn’t matter, but it’s important to have “service” and “label” stay the same. The entire 

list of columns names is as follows:

•  duration

•  protocol_type

•  service

•  flag


•  src_bytes

•  dst_bytes

•  land

•  wrong_fragment

•  urgent

•  hot


FROXPQV >GXUDWLRQSURWRFROBW\SHVHUYLFHIODJVUFBE\WHV

GVWBE\WHVODQGZURQJBIUDJPHQWXUJHQW

KRWQXPBIDLOHGBORJLQVORJJHGBLQQXPBFRPSURPLVHG

URRWBVKHOOVXBDWWHPSWHGQXPBURRW

QXPBILOHBFUHDWLRQVQXPBVKHOOVQXPBDFFHVVBILOHV

QXPBRXWERXQGBFPGVLVBKRVWBORJLQ

LVBJXHVWBORJLQFRXQWVUYBFRXQWVHUURUBUDWH

VUYBVHUURUBUDWHUHUURUBUDWHVUYBUHUURUBUDWH

VDPHBVUYBUDWHGLIIBVUYBUDWHVUYBGLIIBKRVWBUDWH

GVWBKRVWBFRXQWGVWBKRVWBVUYBFRXQW

GVWBKRVWBVDPHBVUYBUDWHGVWBKRVWBGLIIBVUYBUDWH

GVWBKRVWBVDPHBVUFBSRUWBUDWHGVWBKRVWBVUYBGLIIBKRVWBUDWH

GVWBKRVWBVHUURUBUDWHGVWBKRVWBVUYBVHUURUBUDWH

GVWBKRVWBUHUURUBUDWHGVWBKRVWBVUYBUHUURUBUDWHODEHO@

GI SGUHDGBFVY GDWDVHWVNGGBFXSBNGGFXSGDWDNGGFXSGDWDFRUUHFWHG

VHS QDPHV FROXPQVLQGH[BFRO 1RQH



Figure 2-11.  You define all of the columns and save the data set as a variable 

named df

Chapter 2   traditional Methods of anoMaly deteCtion




40

•  num_failed_logins

•  logged_in

•  num_compromised

•  root_shell

•  su_attempted

•  num_root

•  num_file_creations

•  num_shells

•  num_access_files

•  num_outbound_cmds

•  is_host_login

•  is_guest_login

•  count


•  srv_count

•  serror_rate

•  srv_serror_rate

•  rerror_rate

•  srv_rerror_rate

•  same_srv_rate

•  diff_srv_rate

•  srv_diff_host_rate

•  dst_host_count

•  dst_host_srv_count

•  dst_host_same_srv_rate

•  dst_host_diff_srv_rate

•  dst_host_same_src_port_rate

Chapter 2   traditional Methods of anoMaly deteCtion




41

•  dst_host_srv_diff_host_rate

•  dst_host_serror_rate

•  dst_host_srv_serror_rate

•  dst_host_rerror_rate

•  dst_host_srv_rerror_rate

•  label

To get the dimensions of the table, or 



shape, as it’s referred to in pandas, do

df.shape


or if you’re not in Jupyter, do

print(df.shape)

In Jupyter, you should see something like Figure 

2-12


 after running the code.

As you can see, this is a massive dataset.

Next, filter out the entire data frame to only include data entries that involve an 

HTTP attack, and drop the service column (Figure 

2-13

).

Just to make sure, check the shape of df again (Figure 



2-14

).

Figure 2-12.  The output is a tuple that describes the dimensions of the data frame

GI GI>GI>VHUYLFH@  KWWS@

GI GIGURS VHUYLFHD[LV 

FROXPQVUHPRYH VHUYLFH

Figure 2-13.  Filtering df to only have HTTP attacks and removing the service 

column from df

Chapter 2   traditional Methods of anoMaly deteCtion




42

The number of rows has been drastically reduced, and the column count went 

down by one because you removed the service column since you don’t actually need it 

anymore.


Let’s check all the possible labels and the number of counts for each label, just to get 

a feel of the data distribution.

Run the following:

df["label"].value_counts()

or

print(df["label"].value_counts())



You should see something like Figure 

2-15


.

The vast majority of the data set is comprised of normal data entries, with around 

0.649% of data entries for all HTTP attacks comprising actual intrusion attacks.

Additionally, some of the columns have categorical data values, meaning the model 

will have trouble training on them. To bypass this issue, you use a built-in feature of 

scikit-learn called a 



label encoder.

Figure 


2-16

 shows what you currently see if you run df.head(5), meaning you want 

five entries to display.


Download 26,57 Mb.

Do'stlaringiz bilan baham:
1   ...   38   39   40   41   42   43   44   45   ...   283




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish