MindMap Gallery hierarchical clustering
Hierarchical clustering is a clustering algorithm. Its basic idea is to regard all observations (or samples) to be classified as an initial clustering group, and then classify this clustering group hierarchically according to a certain clustering criterion. The method is decomposed into several subgroups in turn until certain termination conditions are met.
Edited at 2023-12-23 14:06:33This Valentine's Day brand marketing handbook provides businesses with five practical models, covering everything from creating offline experiences to driving online engagement. Whether you're a shopping mall, restaurant, or online brand, you'll find a suitable strategy: each model includes clear objectives and industry-specific guidelines, helping brands transform traffic into real sales and lasting emotional connections during this romantic season.
This Valentine's Day map illustrates love through 30 romantic possibilities, from the vintage charm of "handwritten love letters" to the urban landscape of "rooftop sunsets," from the tactile experience of a "pottery workshop" to the leisurely moments of "wine tasting at a vineyard"—offering a unique sense of occasion for every couple. Whether it's cozy, experiential, or luxurious, love always finds the most fitting expression. May you all find the perfect atmosphere for your love story.
The ice hockey schedule for the Milano Cortina 2026 Winter Olympics, featuring preliminary rounds, quarterfinals, and medal matches for both men's and women's tournaments from February 5–22. All game times are listed in Eastern Standard Time (EST).
This Valentine's Day brand marketing handbook provides businesses with five practical models, covering everything from creating offline experiences to driving online engagement. Whether you're a shopping mall, restaurant, or online brand, you'll find a suitable strategy: each model includes clear objectives and industry-specific guidelines, helping brands transform traffic into real sales and lasting emotional connections during this romantic season.
This Valentine's Day map illustrates love through 30 romantic possibilities, from the vintage charm of "handwritten love letters" to the urban landscape of "rooftop sunsets," from the tactile experience of a "pottery workshop" to the leisurely moments of "wine tasting at a vineyard"—offering a unique sense of occasion for every couple. Whether it's cozy, experiential, or luxurious, love always finds the most fitting expression. May you all find the perfect atmosphere for your love story.
The ice hockey schedule for the Milano Cortina 2026 Winter Olympics, featuring preliminary rounds, quarterfinals, and medal matches for both men's and women's tournaments from February 5–22. All game times are listed in Eastern Standard Time (EST).
hierarchical clustering
Introduction
Algorithmic idea: divide hierarchies according to a certain method until certain conditions are met.
Illustration:
Two hierarchical clustering methods
agglomeration method
Algorithm idea: bottom-up, first treat each object as a cluster, and then merge the clusters into larger and larger clusters until all objects are in one cluster or meet a certain termination condition
Algorithm steps
Step 1: Calculate the distance between each sample
Step 2: The two samples with the smallest distance are clustered into one category, namely cluster C1
Step 3: Calculate the distance from other samples to C1
Distance measurement method between clusters
Method 1: Shortest distance method (the minimum distance between samples in cluster Ci and cluster Cj is used as the inter-cluster distance)
Method 2: Longest distance method (the maximum distance between samples in cluster Ci and cluster Cj is used as the inter-cluster distance)
Method 3: Class average method (the mean of the distances between cluster Ci and all samples in cluster Cj is used as the inter-cluster distance)
Method 4: Center method (the distance between the center points of cluster Ci and cluster Cj (the mean value of the samples in the cluster) is used as the inter-cluster distance)
Step 4: Loop steps 2 and 3 until all objects are in a cluster or meet a certain termination condition
Illustration:
split method
Algorithm idea: top-down, first place all objects in the same cluster, and then gradually divide them into smaller and smaller clusters until each object forms a cluster of its own or meets a certain termination condition
Algorithm steps
Step 1: Group all samples into one cluster, calculate the distance between each sample, and select the two samples with the furthest distance.
Step 2: Divide the two furthest samples into two clusters and calculate the distances of other samples to the two clusters.
The distance measurement method is exactly the same as the agglomeration method
Step 3: Divide other samples into closer clusters
Step 4: Loop through steps 2 and 3 until each object forms a cluster or meets a certain termination condition.
Illustration:
Advantages and Disadvantages of Hierarchical Clustering
advantage
Distance and rule similarity are easy to define
No need to specify the number of clusters in advance
You can discover the hierarchical relationship of classes
shortcoming
The computational complexity is too high and the amount of data is too large to be applicable.
The model is more sensitive to outliers
Cluster shape tends to be chain-like
optimization
Aiming at the problem that hierarchical clustering data is too large to be used
Method: Use multi-stage clustering technology to perform clustering in an incremental manner to greatly reduce clustering time, that is, BIRCH algorithm
Incremental: The clustering decision of each data point is based on the currently processed data points, rather than based on the global data points.
BIRCH algorithm
Algorithm principle: Clustering features use 3-tuples to obtain relevant information about a cluster. Clustering is obtained by constructing a clustering feature tree that meets the constraints of branching factor and cluster diameter. Each leaf node is a cluster.
several concepts
Clustering Features (CF)
Definition: CF is a triplet, which can be represented by (N, LS, SS). Among them, N represents the number of samples in this CF; LS represents the sum vector of each feature dimension of the sample points in this CF, and SS represents the sum of squares of each feature dimension of the sample points in this CF.
Properties: Satisfy the linear relationship, that is, CF1 CF2=(N1 N2,LS1 LS2,SS1 SS2)
Example: Suppose a certain CF contains 5 two-dimensional feature samples (3,4), (2,6), (4,5), (4,7), (3,8)
CF's N=5
LS of CF=(3 2 4 4 3,4 6 5 7 8)=(16,30)
SS of CF=(3^2 2^2 4^2 4^2 3^2 4^2 6^2 5^2 7^2 8^2)=54 190=244
Cluster feature tree (CF-tree)
Definition: Leaf nodes are clusters, and non-leaf nodes store the CF sum of their descendants.
Parameters of CF Tree
Maximum number of non-leaf nodes: B (branching factor)
The maximum number of CFs contained in each leaf node: L
Maximum radius threshold for each CF of leaf nodes: T
CF-tree creation process
Step 1: Read in the first sample and incorporate it into the new triplet LN1
Illustration:
Step 2: Read the second sample. If it is within a sphere with radius T as the previous sample, set it to the same triplet. Otherwise, generate a new triplet LN2.
Illustration:
Step 3: If the new sample is closest to the LN1 node, but is no longer within the hypersphere radius T of SC1, SC2, and SC3, and L=3, it needs to be split.
Illustration:
Step 4: Among all CF tuples in LN1, find the two farthest CFs to be the seed CFs of these two new leaf nodes, and then add all CFs sc1, sc2, sc3 in the LN1 node, as well as the new elements of the new sample point. Group sc6 is divided into two new leaf nodes
Illustration:
Step 5: Repeat steps 2, 3, and 4 until the termination condition is met
Advantages and Disadvantages
advantage
Clustering speed is fast and noise points can be identified
Linear scalability, good clustering quality
shortcoming
Can only handle numerical data
Sensitive to data input order
Does not work well when clusters are non-spherical