Machine Learning: 2 Books in 1: Machine Learning for Beginners, Machine Learning Mathematics. An Introduction Guide to Understand Data Science Through the Business Application

Download 1,94 Mb.

Pdf ko'rish

bet	29/96
Sana	22.06.2022
Hajmi	1,94 Mb.
	#692449

1 ... 25 26 27 28 29 30 31 32 ... 96

Bog'liq
2021272010247334 5836879612033894610

K-Means clustering

Clustering
Clustering is a sub-group of unsupervised learning. Clustering is the task of
grouping similar things together When we use clustering, we can identify
characteristics and sort our data based on these characteristics. If we are
using machine learning for marketing, clustering can help us identify
similarities in groups of customers of potential clients. Unsupervised
learning can help us sort customers into categories that we might not have
created with the help of machine learning. It can also help you sort your
data when you are working with a large number of variables.
K-Means clustering
K-means clustering works similarly to K-nearest neighbors You pick a
number for k to decide how many groups you want to see. You continue to
cluster and repeat until clusters are more clearly classified.

Your data is grouped around centroids, which are the points on your graph
that you have chosen where you want to see your data clustered. You
choose them at random, and you have k of them. Once you introduce your
data to the model, data points are placed in categories indicated by the
closest centroid, which is measured by Euclidean distance. Then you take
the average value of the data points surrounding each centroid. Keep
repeating this process until your results stay the same, and you have
consistent clusters. Each data point is only assigned to one cluster.
You repeat this process by finding the average values for x and y within
each cluster. This will help you extrapolate the average value of the data
points in each cluster. K-means clustering can help you identify previously
unknown or overlooked patterns in the data.
Choose the value for k that is optimal for the number of categories you
want to create. Ideally, you should have more than 3. However, the
advantage associated with adding more clusters diminishes that higher the
number of clusters you have. The higher the value for k that you choose, the
smaller and more specific the clusters are. You wouldn’t want to use a value
for k that is the same as the number of data points because each data point
would end up in its own cluster.
You will have to know your dataset well and use your intuition to guess
how many clusters are appropriate, and what sort of differences that will be
present. However, our intuition and knowledge of the data are less helpful
once we have more than just a few potential groups.

Download 1,94 Mb.

Do'stlaringiz bilan baham:

1 ... 25 26 27 28 29 30 31 32 ... 96