MindMap Gallery hierarchical clustering
Hierarchical clustering is a clustering algorithm. Its basic idea is to regard all observations (or samples) to be classified as an initial clustering group, and then classify this clustering group hierarchically according to a certain clustering criterion. The method is decomposed into several subgroups in turn until certain termination conditions are met.
Edited at 2023-12-23 14:06:33This organizational chart, created using EdrawMax, illustrates the hierarchical structure of the Tesla Supercharger Network team. It showcases the chain of command starting from the Chief Executive Officer (CEO) down to various department heads and their respective team members. Each individual's role is clearly defined, showing how different functions such as planning, project construction, power & energy, and operations & service are organized to efficiently manage and expand the Supercharger network globally.
This 2026 Event Countdown Calendar, specifically highlighting the Christmas countdown and created via EdrawMax, presents a detailed monthly breakdown. Each month features a grid layout with a countdown mechanism towards significant events, especially Christmas. The use of red for countdown numbers creates a striking contrast, making it easy to track the days remaining until the big event. It's a perfect tool for those eagerly anticipating Christmas and wanting to plan related activities in advance.
This 2026 Holiday Planning Calendar, crafted with EdrawMax, offers a well-organized monthly view. Each month displays a grid of dates, with major holidays clearly marked. The color-coded design, using red and blue accents, helps distinguish different months and holidays at a glance. It serves as an excellent resource for planning vacations, family gatherings, or any events around holiday periods, ensuring you can make the most of your time off throughout the year.
This organizational chart, created using EdrawMax, illustrates the hierarchical structure of the Tesla Supercharger Network team. It showcases the chain of command starting from the Chief Executive Officer (CEO) down to various department heads and their respective team members. Each individual's role is clearly defined, showing how different functions such as planning, project construction, power & energy, and operations & service are organized to efficiently manage and expand the Supercharger network globally.
This 2026 Event Countdown Calendar, specifically highlighting the Christmas countdown and created via EdrawMax, presents a detailed monthly breakdown. Each month features a grid layout with a countdown mechanism towards significant events, especially Christmas. The use of red for countdown numbers creates a striking contrast, making it easy to track the days remaining until the big event. It's a perfect tool for those eagerly anticipating Christmas and wanting to plan related activities in advance.
This 2026 Holiday Planning Calendar, crafted with EdrawMax, offers a well-organized monthly view. Each month displays a grid of dates, with major holidays clearly marked. The color-coded design, using red and blue accents, helps distinguish different months and holidays at a glance. It serves as an excellent resource for planning vacations, family gatherings, or any events around holiday periods, ensuring you can make the most of your time off throughout the year.
hierarchical clustering
Introduction
Algorithmic idea: divide hierarchies according to a certain method until certain conditions are met.
Illustration:
Two hierarchical clustering methods
agglomeration method
Algorithm idea: bottom-up, first treat each object as a cluster, and then merge the clusters into larger and larger clusters until all objects are in one cluster or meet a certain termination condition
Algorithm steps
Step 1: Calculate the distance between each sample
Step 2: The two samples with the smallest distance are clustered into one category, namely cluster C1
Step 3: Calculate the distance from other samples to C1
Distance measurement method between clusters
Method 1: Shortest distance method (the minimum distance between samples in cluster Ci and cluster Cj is used as the inter-cluster distance)
Method 2: Longest distance method (the maximum distance between samples in cluster Ci and cluster Cj is used as the inter-cluster distance)
Method 3: Class average method (the mean of the distances between cluster Ci and all samples in cluster Cj is used as the inter-cluster distance)
Method 4: Center method (the distance between the center points of cluster Ci and cluster Cj (the mean value of the samples in the cluster) is used as the inter-cluster distance)
Step 4: Loop steps 2 and 3 until all objects are in a cluster or meet a certain termination condition
Illustration:
split method
Algorithm idea: top-down, first place all objects in the same cluster, and then gradually divide them into smaller and smaller clusters until each object forms a cluster of its own or meets a certain termination condition
Algorithm steps
Step 1: Group all samples into one cluster, calculate the distance between each sample, and select the two samples with the furthest distance.
Step 2: Divide the two furthest samples into two clusters and calculate the distances of other samples to the two clusters.
The distance measurement method is exactly the same as the agglomeration method
Step 3: Divide other samples into closer clusters
Step 4: Loop through steps 2 and 3 until each object forms a cluster or meets a certain termination condition.
Illustration:
Advantages and Disadvantages of Hierarchical Clustering
advantage
Distance and rule similarity are easy to define
No need to specify the number of clusters in advance
You can discover the hierarchical relationship of classes
shortcoming
The computational complexity is too high and the amount of data is too large to be applicable.
The model is more sensitive to outliers
Cluster shape tends to be chain-like
optimization
Aiming at the problem that hierarchical clustering data is too large to be used
Method: Use multi-stage clustering technology to perform clustering in an incremental manner to greatly reduce clustering time, that is, BIRCH algorithm
Incremental: The clustering decision of each data point is based on the currently processed data points, rather than based on the global data points.
BIRCH algorithm
Algorithm principle: Clustering features use 3-tuples to obtain relevant information about a cluster. Clustering is obtained by constructing a clustering feature tree that meets the constraints of branching factor and cluster diameter. Each leaf node is a cluster.
several concepts
Clustering Features (CF)
Definition: CF is a triplet, which can be represented by (N, LS, SS). Among them, N represents the number of samples in this CF; LS represents the sum vector of each feature dimension of the sample points in this CF, and SS represents the sum of squares of each feature dimension of the sample points in this CF.
Properties: Satisfy the linear relationship, that is, CF1 CF2=(N1 N2,LS1 LS2,SS1 SS2)
Example: Suppose a certain CF contains 5 two-dimensional feature samples (3,4), (2,6), (4,5), (4,7), (3,8)
CF's N=5
LS of CF=(3 2 4 4 3,4 6 5 7 8)=(16,30)
SS of CF=(3^2 2^2 4^2 4^2 3^2 4^2 6^2 5^2 7^2 8^2)=54 190=244
Cluster feature tree (CF-tree)
Definition: Leaf nodes are clusters, and non-leaf nodes store the CF sum of their descendants.
Parameters of CF Tree
Maximum number of non-leaf nodes: B (branching factor)
The maximum number of CFs contained in each leaf node: L
Maximum radius threshold for each CF of leaf nodes: T
CF-tree creation process
Step 1: Read in the first sample and incorporate it into the new triplet LN1
Illustration:
Step 2: Read the second sample. If it is within a sphere with radius T as the previous sample, set it to the same triplet. Otherwise, generate a new triplet LN2.
Illustration:
Step 3: If the new sample is closest to the LN1 node, but is no longer within the hypersphere radius T of SC1, SC2, and SC3, and L=3, it needs to be split.
Illustration:
Step 4: Among all CF tuples in LN1, find the two farthest CFs to be the seed CFs of these two new leaf nodes, and then add all CFs sc1, sc2, sc3 in the LN1 node, as well as the new elements of the new sample point. Group sc6 is divided into two new leaf nodes
Illustration:
Step 5: Repeat steps 2, 3, and 4 until the termination condition is met
Advantages and Disadvantages
advantage
Clustering speed is fast and noise points can be identified
Linear scalability, good clustering quality
shortcoming
Can only handle numerical data
Sensitive to data input order
Does not work well when clusters are non-spherical