Python Projects for Beginners a ten-Week Bootcamp Approach to Python Programming



Download 2,61 Mb.
bet189/200
Sana20.06.2022
Hajmi2,61 Mb.
#681748
1   ...   185   186   187   188   189   190   191   192   ...   200
Bog'liq
Python Projects for Beginners A Ten Week Bootcamp Approach to Python

Outer Join


If we want to return all the records, but connect the ratings for people who gave one, we would need to perform an outer join. This would allow us to keep all records from our original DataFrame while adding the ratings column. We need to specify the how parameter to “outer”:

# performing an outer join with our df and ratings DataFrames based on names, get all data all_ratings = df.merge(ratings, on="names", how="outer") all_ratings.head( )

Go ahead and run the cell. We’ll get a DataFrame of all seven records this time; however, those that didn’t give a rating were given a NaN for a value. This stands for “Not a Number.” Once we combine this information, we could then find out the average age of those who gave a rating and those who didn’t. From a marketing perspective, this would be helpful to know who the target demographic should be.
CHapter 10 INtroduCtIoN to data aNalYsIs

Dataset Pipeline


A dataset pipeline is a specific process in which we take our data and clean it for our model, which will be able to make predictions. This can be a lengthy process if the dataset that you use is unclean. A dataset that is not clean will have duplicates records, null values everywhere, or unfiltered information that leads to incorrect predictions. Here is the general process:

  1. Performing Exploratory Analysis

    • In this step you want to get to know your data very well. Take notes for what you see at a glance or what you may want to clean or add. You essentially want to get a feel for what your data has to offer. Make note of the number of columns, the data types, outliers, null values, and columns that aren’t necessary. This is generally when you want to plot out each column of data and speculate correlations, non- informational features, etc.

  2. Data Cleaning

    • Improper cleaning can lead to poor predictions and bad datasets. Here, you’ll want to remove unwanted observations like duplicates, fix structural errors like columns that have the same name but are typos, handle missing data, and filter outlier information. This is key for the next step.


  3. Download 2,61 Mb.

    Do'stlaringiz bilan baham:
1   ...   185   186   187   188   189   190   191   192   ...   200




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish