MindMap Gallery DBSCAN
Density clustering algorithm, including definition, several concepts involved in DBSCAN, algorithm steps, Advantages and disadvantages of DBSCAN, etc.
Edited at 2023-12-23 14:05:37This Valentine's Day brand marketing handbook provides businesses with five practical models, covering everything from creating offline experiences to driving online engagement. Whether you're a shopping mall, restaurant, or online brand, you'll find a suitable strategy: each model includes clear objectives and industry-specific guidelines, helping brands transform traffic into real sales and lasting emotional connections during this romantic season.
This Valentine's Day map illustrates love through 30 romantic possibilities, from the vintage charm of "handwritten love letters" to the urban landscape of "rooftop sunsets," from the tactile experience of a "pottery workshop" to the leisurely moments of "wine tasting at a vineyard"—offering a unique sense of occasion for every couple. Whether it's cozy, experiential, or luxurious, love always finds the most fitting expression. May you all find the perfect atmosphere for your love story.
The ice hockey schedule for the Milano Cortina 2026 Winter Olympics, featuring preliminary rounds, quarterfinals, and medal matches for both men's and women's tournaments from February 5–22. All game times are listed in Eastern Standard Time (EST).
This Valentine's Day brand marketing handbook provides businesses with five practical models, covering everything from creating offline experiences to driving online engagement. Whether you're a shopping mall, restaurant, or online brand, you'll find a suitable strategy: each model includes clear objectives and industry-specific guidelines, helping brands transform traffic into real sales and lasting emotional connections during this romantic season.
This Valentine's Day map illustrates love through 30 romantic possibilities, from the vintage charm of "handwritten love letters" to the urban landscape of "rooftop sunsets," from the tactile experience of a "pottery workshop" to the leisurely moments of "wine tasting at a vineyard"—offering a unique sense of occasion for every couple. Whether it's cozy, experiential, or luxurious, love always finds the most fitting expression. May you all find the perfect atmosphere for your love story.
The ice hockey schedule for the Milano Cortina 2026 Winter Olympics, featuring preliminary rounds, quarterfinals, and medal matches for both men's and women's tournaments from February 5–22. All game times are listed in Eastern Standard Time (EST).
DBSCAN
Introduction
Algorithm idea: For each core point, if the density of its adjacent area is greater than the threshold, add it to a cluster close to it.
Several concepts involved in DBSCAN
Eps neighborhood: Given an object p and radius d, draw a ball with object p as the center and radius d:
Core point: Given an object p and a number minpts, the number of objects in its neighborhood is greater than minpts:
Boundary point: Given an object p and a number minpts, the number of objects in its neighborhood is less than minpts, but it is within the area of other core points.
Outlier point: Given an object p and a number minpts, the number of objects in its neighborhood is less than minpts, and it is not within the area of other core points.
Direct density reachability: The core point to any data point in its neighborhood is directly density reachable:
Density is reachable: from the core point p to a point q in its neighborhood, that is, p->q; from the core point q to a point n in its neighborhood, that is, q->n; then p->n is called density reachable
Density connected: If there is a core point o, o->p; o->q, then p and q are said to be density connected:
Algorithm steps
Step 1: Traverse and mark all sample points
Step 2: Select any point without a cluster label
Core point: Integrate all sample points with reachable density into a new cluster
Boundary point: Skip the boundary point and scan the next sample point
Step 3: Loop step 2 until all points are scanned
Advantages and Disadvantages of DBSCAN
advantage
Not sensitive to noise
Clusters of arbitrary shapes can be found
No need to manually set the number of clusters
shortcoming
The model is very sensitive to the parameters Eps and minpts
When the density of the data is uneven and the cluster spacing differs greatly, the clustering quality is poor.
optimization
For parameter sensitive issues
Method: By introducing core distance and reachable distance, the clustering algorithm is made insensitive to the input parameters. That is, the OPTICS algorithm
OPTICS
Algorithm idea: Calculate the reachable distance of all samples to offset the sensitivity of the Eps parameter
several concepts
Core distance: the minimum distance that satisfies minpts
Reachable distance: the smaller value of the Euclidean distance between the sample point and the core point and the core distance of the core point
Algorithm steps
Step 1: Given the data set D, create two queues, the ordered queue O and the result queue R (the ordered queue is used to store core objects and their density direct objects, and are arranged in ascending order by reachable distance; The result queue is used to store the output order of sample points. The ordered queue can be understood as the data to be processed, while the result queue contains the processed data.)
Step 2: If all points in D have been processed or there are no core points, the algorithm ends. Otherwise, select a sample point p that is unprocessed (that is, not in the result queue R) and is a core object, first put p into the result queue R, and delete p from D. Then find all the densities of p in D directly to the sample point x, and calculate the reachable distance from x to p. If x is not in the ordered queue O, put x and the reachable distance into O. If x is in O, then If the new reachable distance of x is smaller, update the reachable distance of x, and finally reorder the data in O according to the reachable distance from small to large.
Step 3: If the ordered queue O is empty, go back to step 2, otherwise take out the first sample point y in O (that is, the sample point with the smallest reachable distance), put it into R, and remove it from D and O delete y. If y is not a core object, repeat step 3 (that is, find the sample point with the smallest reachable distance of the remaining data in O); if y is a core object, find all the densities of y in D that reach the sample points, and calculate the reachable distance, and then follow step 2 to update the density of all y up to the sample points into O
Step 4: Repeat steps 2 and 3 until the algorithm ends, and finally get an ordered output result and the corresponding reachable distance.
for example
The known data set is shown in the figure:
Step 1: Calculate the reachable distance from the core point to other points
Step 2: Sort the reachable distance, select smaller sample points, and repeat step one:
Step 3: Output the core objects and their reachable distances, and divide them into clusters. Core objects: [0, 1, 3, 6, 5, 2, 4], reachable distances: [inf, 3.16227766, 4.12310563, 1.41421356, 1. ,3.60555128, 1.41421356]