Big Data Mining Technology Chapter 7

Chapter 7 mainly introduces the application of big data in economic management, including cluster analysis, classification analysis, association rules, supervised learning, and unsupervised learning.

Edited at 2021-11-27 09:40:03

PlotWizard

Recent works View more works>>

Big Data Mining Technology Chapter 7

PlotWizard

Recent works View more works>>

Recommended to you
Outline

Big data mining technology

Cluster analysis

Concept: It is to divide a data set into different classes or clusters according to a certain standard (such as distance), so that the similarity of data objects in the same cluster is as large as possible, while the differences of data objects that are not in the same cluster are Also as big as possible. That is, after clustering, data of the same type are gathered together as much as possible, and data of different types are separated as much as possible.

technology:

K-means

Advantages: 1. It is unsupervised learning and does not need to prepare a training set; 2. The principle is simple and easy to implement; 3. The results are well interpretable

Disadvantages: 1. The number of clusters K is an input parameter; 2. It is sensitive to abnormal points and outliers; 3. Use numerical data

hierarchical clustering

agglomerative hierarchical clustering

divisive hierarchical clustering

Differences from classification:

Classification: The categories are known. By training and learning the data of known classifications, we can find the characteristics of these different categories and classify the unclassified data. It belongs to supervised learning.

Clustering: It is not known in advance how many categories the data will be divided into, and the data is aggregated into several groups through cluster analysis. Clustering does not require training and learning of the data. It belongs to unsupervised learning.

Classification analysis

Purpose: Obtain a classification function or classification model (often called a classifier) that can map data items in the database to a given category

Technology: KNN, SVM, decision tree (ID.3)

Association rules

Association rules: reflect the interdependence and correlation between one thing and other things. It is an important technology in data mining and is used to mine the correlation between valuable data items from a large amount of data.

supervised learning

Definition: Based on the existing data set, know the relationship between the input and output results, and train an optimal model based on this known relationship.

Classification: Regression, Classification

Advantages: Know exactly the category to which each training sample belongs; use training samples to learn a classifier and classify samples of unknown categories; Key issues

unsupervised learning

Definition: We don’t know the relationship between the data and features in the data set, but we need to get the relationship between the data based on clustering or a certain model.

Advantages: Using unsupervised sample sets, we hope to learn some regularity and construct a corresponding classifier.