MindMap Gallery K-means
Principle and extension of K-means clustering algorithm. Algorithm idea: For a given data object set, divide the data object set into K clusters according to the distance between data objects, so that the points in the clusters are connected as closely as possible, and the distance between clusters is as close as possible. big.
Edited at 2023-12-23 14:03:33This Valentine's Day brand marketing handbook provides businesses with five practical models, covering everything from creating offline experiences to driving online engagement. Whether you're a shopping mall, restaurant, or online brand, you'll find a suitable strategy: each model includes clear objectives and industry-specific guidelines, helping brands transform traffic into real sales and lasting emotional connections during this romantic season.
This Valentine's Day map illustrates love through 30 romantic possibilities, from the vintage charm of "handwritten love letters" to the urban landscape of "rooftop sunsets," from the tactile experience of a "pottery workshop" to the leisurely moments of "wine tasting at a vineyard"—offering a unique sense of occasion for every couple. Whether it's cozy, experiential, or luxurious, love always finds the most fitting expression. May you all find the perfect atmosphere for your love story.
The ice hockey schedule for the Milano Cortina 2026 Winter Olympics, featuring preliminary rounds, quarterfinals, and medal matches for both men's and women's tournaments from February 5–22. All game times are listed in Eastern Standard Time (EST).
This Valentine's Day brand marketing handbook provides businesses with five practical models, covering everything from creating offline experiences to driving online engagement. Whether you're a shopping mall, restaurant, or online brand, you'll find a suitable strategy: each model includes clear objectives and industry-specific guidelines, helping brands transform traffic into real sales and lasting emotional connections during this romantic season.
This Valentine's Day map illustrates love through 30 romantic possibilities, from the vintage charm of "handwritten love letters" to the urban landscape of "rooftop sunsets," from the tactile experience of a "pottery workshop" to the leisurely moments of "wine tasting at a vineyard"—offering a unique sense of occasion for every couple. Whether it's cozy, experiential, or luxurious, love always finds the most fitting expression. May you all find the perfect atmosphere for your love story.
The ice hockey schedule for the Milano Cortina 2026 Winter Olympics, featuring preliminary rounds, quarterfinals, and medal matches for both men's and women's tournaments from February 5–22. All game times are listed in Eastern Standard Time (EST).
K-means
Introduction
Algorithm idea: For a given data object set, divide the data object set into K clusters according to the distance between data objects, so that the points in the clusters are connected as closely as possible, and the distance between clusters is as close as possible. big
Illustration:
Algorithm steps
Step 1: Select the initial centers of K clusters
Step 2: Calculate the distance between each sample and the K initial centers and attribute them to the cluster with the closest distance.
Step 3: Recalculate the center of the cluster (the mean of the samples in the cluster)
Step 4: Repeat steps 2 and 3 until all samples no longer change.
Illustration:
Several issues to consider with K-means
How is the number of clusters determined?
Method 1: Elbow method (calculate the SSE of the model at each K value and select the K value with the smallest change in SSE)
Illustration:
Method 2: Silhouette coefficient (calculate the silhouette coefficient of the model under each K value, and select the K value with the largest silhouette coefficient)
Idea: Clustering evaluation by examining the separation and compactness of clusters
Illustration:
How is the initial center determined?
Method 1: Random selection
Method 2: Specify the location
Method 3: K-means
Idea: When selecting the initial center, try to keep the distance between each initial center as far as possible
Illustration:
Advantages and Disadvantages of K-means
advantage
It is also simple and efficient for large data sets, with low time complexity and space complexity.
The algorithm has strong interpretability
shortcoming
When the data set is large, the calculation speed is slow and the result is easy to be local optimal.
K-means is more sensitive to the number of K values and the location of the initial center
K-means is very sensitive to noise and outliers
The mean cannot be calculated for data sets containing categorical attributes, making the algorithm unavailable.
K-means can only cluster spherical clusters
Optimization of K-means
To solve the problem of slow calculation speed when the data set is too large
Method: Randomly sample the data set multiple times, and cluster each sampled subset using K-means until the cluster center becomes stable (MiniBatchKMeans)
MiniBatchKMeans algorithm steps
Step 1: Random sampling of the sample set
Step 2: K-means
Step 3: Repeat steps 1 and 2 until the cluster center becomes stable.
For the problem that the mean cannot be calculated when the attribute is of categorical type
Method: replace the mean by calculating the mode (K-mode)
For data sets where it is difficult to determine the number of clusters K
Method: Calculate the cluster center through the mean value of the samples in a given area, and continuously update the cluster center until the cluster center becomes stable (Mean-Shift)
Mean-Shift algorithm steps
Step 1: Randomly select a sample point and calculate the mean vector of the distances from other sample points to it:
Step 2: Move the position of the sample point according to the mean vector, and then calculate the mean vector of the distance from other sample points to it again until the absolute value of the mean vector is small enough or the sample point cannot be moved.
Step 3: Repeat steps 1 and 2 until all sample points are traversed
Mean-Shift optimization
For the calculation of the mean vector, the contribution of other sample points to the current sample point is not considered.
Use the Gaussian kernel function to measure the contribution of other sample points to the current sample point: