O’ZBEKISTON RESPUBLIKASI AXBOROT TEXNALOGIYALARI VA KOMUNIKATSIYALARINI RIVOJLANTIRISH VAZIRLIGI
MUHAMMAD AL-XORAZMIY NOMIDAGI TOSHKENT AXBOROT TEXNALOGIYALARI UNIVERSITETI
Kompyuter tizimlari kafedrasi
Ma’lumotlar intelektual tahlili fani bo’yicha
2-LABORATORIYA ISHI
Guruh: KIF 214-18
Bajardi: Durnazarov Yorqin
Tekshirdi: Ochilov Temur.
TOSHKENT – 2021
Laboratoriya-2
A) Har bir guruhda talabalar jurnal boshidan 3 ta kichik guruhlarga bo’linadi. Agar guruhdagi talabalar sonini 3 ga bo`lib 3 ta guruh hosil qilamiz. Birinchi guruh Salary_data.csv faylini oladi. Ikkinchi guruh housing_data.csv faylini oladi. Uchinchi guruh brain_body.txt faylini oladi.
B) Ma`lumotlarni o`qib olish
C) O`qib olingan ma`lumotlarni ekranga chiqarish
D) Ularni scatter plot orqali visuallashtirish
E) StatsModels kutubxonasidan foydalanib ushbu ma`lumotlar uchun statistikalarni ekranga chiqarish
F) StatsModels kutubxonasidan foydalangan holda berilgan dataset orqali chiziqli regressiya(linear regression) quring.
G) Berilgan ma’lumotlarni umumlashtiruvchi chiziqni(Best fit line ) toping
H) Ushbu chiziqning matematik ko’rinishini yozing(Misol uchun y=x*0.0017+0.2750)
Ishni bajarish:
2 – guruh: Salary_data.csv
Ma’lumotlarni o’qib olish:
Dastlab kerakli kutubxona va modullarni import qilamiz:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
sns.set()
'Mall_Customers.csv' fayldan ma’lumotlarni o’qib olamiz:
data = pd.read_csv('Mall_Customers.csv')
o’qilgan malumotlarni ekranga chiqarish:
data
Ularni scatter plot orqali visuallashtirish
plt.scatter(data['Annual Income (k$)'], data['Spending Score (1-100)'])
plt.xlim(0, 150)
plt.ylim(-10, 110)
plt.show
Xususiyatlarni belgilash
x = data.iloc[:,3:5]
x
D) Klassterlash
kmeans = KMeans(5)
kmeans.fit(x)
Klassterlash natijalari:
identified_clusters = kmeans.fit_predict(x)
identified_clusters
data_with_clusters = data.copy()
data_with_clusters['Cluster'] = identified_clusters
data_with_clusters
plt.scatter(data_with_clusters['Annual Income (k$)'], data_with_clusters['Spending Score (1-100)'], c = data_with_clusters['Cluster'], cmap = 'rainbow')
plt.xlim(0, 150)
plt.ylim(-10, 110)
plt.show()
Klasterlar nomerini tanlash
WCSS
kmeans.inertia_
wcss = []
for i in range(1, 8):
kmeans = KMeans(i)
kmeans.fit(x)
wcss_iter = kmeans.inertia_
wcss.append(wcss_iter)
wcss
The Elbow Method
number_clusters = range(1, 8)
plt.plot(number_clusters, wcss)
plt.title("The Elbow Method")
plt.xlabel('Number of cluster')
plt.ylabel('Within-cluster Sun of Squares')
Do'stlaringiz bilan baham: |