MindMap Gallery data mining
The background, development history, basic concepts, steps, analysis methods and applications of data mining
Edited at 2020-04-23 11:04:40Avatar 3 centers on the Sully family, showcasing the internal rift caused by the sacrifice of their eldest son, and their alliance with other tribes on Pandora against the external conflict of the Ashbringers, who adhere to the philosophy of fire and are allied with humans. It explores the grand themes of family, faith, and survival.
This article discusses the Easter eggs and homages in Zootopia 2 that you may have discovered. The main content includes: character and archetype Easter eggs, cinematic universe crossover Easter eggs, animal ecology and behavior references, symbol and metaphor Easter eggs, social satire and brand allusions, and emotional storylines and sequel foreshadowing.
[Zootopia Character Relationship Chart] The idealistic rabbit police officer Judy and the cynical fox conman Nick form a charmingly contrasting duo, rising from street hustlers to become Zootopia police officers!
Avatar 3 centers on the Sully family, showcasing the internal rift caused by the sacrifice of their eldest son, and their alliance with other tribes on Pandora against the external conflict of the Ashbringers, who adhere to the philosophy of fire and are allied with humans. It explores the grand themes of family, faith, and survival.
This article discusses the Easter eggs and homages in Zootopia 2 that you may have discovered. The main content includes: character and archetype Easter eggs, cinematic universe crossover Easter eggs, animal ecology and behavior references, symbol and metaphor Easter eggs, social satire and brand allusions, and emotional storylines and sequel foreshadowing.
[Zootopia Character Relationship Chart] The idealistic rabbit police officer Judy and the cynical fox conman Nick form a charmingly contrasting duo, rising from street hustlers to become Zootopia police officers!
data mining
Background
Necessity is the mother of invention
Data explosion problem
We are inundated with data but lack knowledge
Solution: Data Warehousing and Data Mining
Data Warehousing and Online Analytical Processing (OLAP)
Extract interesting knowledge (rules, regularities, patterns, constraints, etc.) from data in large databases
development path
1960
Data collection, database creation, IMS and mesh DBMS
1970
Relational database model, relational DBMS implementation
1980
RDBMS, advanced data models (extended relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.)
1990-2000
Data mining and data warehousing, multimedia databases, and web databases
basic concepts
Data mining refers to the process of searching for information hidden in large amounts of data through algorithms.
step
definition problem
1. The first and most important requirement before starting knowledge discovery is to understand the data and business problem. You must have a clear definition of your goals, that is, decide what you want to do
Build a data mining library
Building a data mining library includes the following steps: data collection, data description, selection, data quality assessment and data cleaning, merging and integration, building metadata, loading data mining library, and maintaining data mining library
analyze data
The purpose of the analysis is to find the data fields that have the greatest impact on the forecast output and determine whether export fields need to be defined.
Prepare data
This is the last step of data preparation before building the model. This step can be divided into four parts: select variables, select records, create new variables, and convert variables
Modeling
Building a model is an iterative process. Different models need to be carefully examined to determine which model is most useful for the business problem faced. First use a portion of the data to build a model, and then use the remaining data to test and validate the resulting model.
Evaluation model
After the model is established, the results obtained must be evaluated and the value of the model explained. The accuracy obtained from the test set is only meaningful for the data used to build the model
implement
Once a model is built and validated, it can be used in two main ways. The first is to provide analysts with a reference; the other is to apply this model to different data sets.
Analytical method
Classification
It first selects a training set that has been classified from the data, uses data mining technology on the training set to build a classification model, and then uses the model to classify unclassified data.
Valuation
Valuation is similar to classification, but the final output result of valuation is a continuous value, and the amount of valuation is not predetermined. Valuation can serve as preparation for classification
predict
It is performed through classification or valuation, and a model is obtained through classification or valuation training. If the model has a high accuracy for the test sample group, the model can be used for unknown variables in new samples. make predictions
Relevance grouping or association rules
The aim is to discover which things always happen together
clustering
It is a method of automatically finding and establishing grouping rules. It divides similar samples into a cluster by judging the similarity between samples.
application
Market analysis and management
Where are the data sources used for analysis?
Credit card transactions, membership cards, discount coupons, customer complaint hotline, (public) lifestyle research
Target marketing
Find groups of customers who share the same characteristics: interests, income levels, spending habits, etc.
Determine customer buying patterns over time
Conversion from personal account to joint account: marriage, etc.
Cross-market analysis
Association/correlation between product sales
Forecasting based on correlated information
Customer profiling
Data mining can tell us what kind of customers buy what products (clustering or classification)
Identify customer needs
Identify the best products for different customers
Use predictions to discover what factors influence new customers
Provide summary information
Various multi-dimensional summary reports
Statistical summary information (central tendency and variance of the data)
Legal Entity Analysis and Risk Management
Financial Planning and Asset Valuation
Cash flow analysis and forecasting
Provisional asset appraisal
Cross-sectional and time series analysis (financial-ratio, trend analysis, etc.)
resource planning
Summary and comparison of resources and expenses
compete
Managing competitors and market guidance
Segmenting customers and class-based pricing
Adjust pricing strategies in highly competitive markets
Deception detection and management
application
subtopic
method
Use historical data to model deceptive behavior and use data mining to help identify similar instances
example
Car Insurance: Detect someone who fakes an accident to get insurance compensation
Money Laundering: Detecting Suspicious Money Transactions (US Treasury's Financial Crimes Enforcement Network)
Medical insurance: detection of occupational disease patients, circle of doctors and referrals
Detect inappropriate medical treatment
The Australian Health Insurance Commission found many comprehensive tests were requested rather than required (saving $1 million a year).
Detect phone spoofing
Telephone calling patterns: distance of calls, duration of calls, number of calls per day or week. Analyze patterns that deviate from expectations
British Telecom identifies discrete groups of callers who made frequent intercom calls, particularly mobile phones, over millions of dollars in scams
retail
Analysts estimate 38% of retail shrinkage is due to disloyal employees
other apps
sports
IBM Advanced Scout analyzes NBA statistics (shots blocked, assists, and fouls) to gain competitive edge against New York Knicks and Miami Heat
astronomical
With the help of data mining, JPL and Palomar Observatory discovered 22 quasars
Internet Web Surf-Aid
IBM Surf-Aid applies data mining algorithms to Web access logs of pages related to transactions to discover pages that customers like, analyze the effectiveness of Web sales, improve the organization of Web sites, etc.