Machine Learning: 2 Books in 1: Machine Learning for Beginners, Machine Learning Mathematics. An Introduction Guide to Understand Data Science Through the Business Application



Download 1,94 Mb.
Pdf ko'rish
bet67/96
Sana22.06.2022
Hajmi1,94 Mb.
#692449
1   ...   63   64   65   66   67   68   69   70   ...   96
Bog'liq
2021272010247334 5836879612033894610

4. 
Data Segregation
The primary goal of the machine learning model is the development of a
high accuracy model on the basis of the quality of its forecasts and
predictions for information derived from the new input data, which was not
part of the training dataset. Therefore, the available labeled dataset will be
utilized as a "proxy" for future unknown input data by dividing the data into


training and testing datasets. Many approaches are available to split the
dataset and some of the most widely used techniques are:
Using either the default or customized ratio to sequentially divide
the dataset into two subsets to ensure that there is no overlap in
the sequence in which the data appears from the source. For
example, you could select the first 75% of data to train the model
and the consequent 25% of data to test the accuracy of the model.
Splitting the dataset into training and testing subset using a
default or custom ratio with a random seed. For example, you
could choose a random 75% of the dataset to train the model and
the remaining 25% of the random dataset to test the model.
Using either of these techniques ("sequential vs. random") and
then also mixing the data within each data subset.
Using a customized injected approach for splitting the data when
extensive control over segregation of the data is required.
Technically the data segregation stage is not considered as an independent
machine learning pipeline, however, an "API" or tool has to be provided to
support this stage. In order to return the required datasets, the next 2 stages
("model training" and "model assessment") must be able to call this "API".
As far as the organization of the code is concerned, a "strategy pattern" is
required so that the "caller service" can select the appropriate algorithm
during execution and the capability to inject the percentage or random seed
is required. The "API" must also be prepared to return the information with
or without labels, to train and test the model respectively. A warning can be
created and passed along with the dataset to secure the "caller service" from
defining parameters that could trigger uneven distribution of the data.



Download 1,94 Mb.

Do'stlaringiz bilan baham:
1   ...   63   64   65   66   67   68   69   70   ...   96




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish