Galleria mappe mentale database
Data warehouse is a strategic collection that provides all types of data support for the decision-making process at all levels of the enterprise. This brain map looks at the system framework design of the data warehouse, multi-dimensional analysis technology, data preprocessing technology, and the key points of bank data warehouse construction. Sort it out to help understand the connotation and construction significance of data warehouse.
Modificato alle 2024-01-19 15:42:49database
Data preprocessing
Data quality assessment criteria
accuracy
integrity
consistency
Timeliness
Credibility
Interpretability
Data preprocessing technology
1. Data cleaning
Purpose:
Resolve data errors and inconsistencies
Format standardization, discovery and processing of abnormal data, data error correction, discovery and removal of duplicate data
Missing value handling
(1) Ignore tuples
(2) Manually fill in missing values
(3) Use uniform constant filling
(4) Fill using attribute average
(5) Use the average value of sample attributes after grouping
(6) Fill with the most likely value
Noisy data processing
(1) Packaging
(2) Clustering
(3) Combination of computer and manual inspection
(4) Return
2.Data integration
Purpose: Integrate data from multiple data sources
3. Data curation
Purpose: Get a more accurate expression of the data
Data curation strategy
(1) Data cube aggregation
(2) Dimension protocol
Wavelet transform
Principal component analysis
(3) Data compression
lossless compression
Lossy compression
(4) Numerical compression
4. Data changes
It is an operation performed to standardize, discretize and conceptually layer data.
Data transformation method
(1) Aggregation: summarize and aggregate data
(2) Data generalization: the process of abstracting from a relatively low conceptual level to a higher conceptual level
(3) Standardization
(4) Attribute construction/feature derivation
data governance
Establishing a complete data governance system requires improving data information management capabilities from several aspects such as systems, standards, monitoring, and processes to solve the following problems
data standards
Support for data platform business needs to be standardized
Data control system
Process specification document
Information item definition
Metadata management
Carry out data impact and context analysis to realize impact and blood relationship analysis on data flow, dependency relationships
Data quality
Data quality requirements are measurable, and the data quality of the data platform needs to be comprehensively managed to implement definable data quality inspection and dimensional analysis, as well as issue tracking.
data service
Provide service communication channels for the data platform for business users and application developers
Data warehouse multi-dimensional analysis technology
Basic concepts of data warehouse
Basic definition: A data warehouse is a subject-oriented, integrated, relatively stable data collection that reflects historical changes and is used to support decision-making in management.
Data warehouse technical characteristics
subject oriented
Theme refers to the goals and requirements of analysis and decision-making. It is proposed by the decision-maker according to the work needs and is ultimately implemented to serve the decision-maker.
Subject-oriented refers to the required organization of the subject where the data in the data warehouse should be found.
Topics applicable to banks generally include
party
internal organization
product
protocol
event
address
channel
marketing
finance
customer assets
integrated
Data warehouse construction is usually the most complex and critical step.
Analysis and decision-making require large amounts of data for analysis, comparison and identification.
There are a lot of duplications and inconsistencies in the data between multiple data sources. Only through systematic processing and cleaning can the next step of integration be carried out.
Relatively stable (non-volatile)
After the data enters the warehouse, it needs to be stored relatively stably for a long time, which is the basic condition to ensure correct decision-making.
Most database operations are queries, with few modifications and deletions.
Reflect historical changes (Time Variant)
Data warehouse minutes store data information that reflects historical time status, and must also be stored in stages based on certain event orders.
Online Analysis (OLAP)
1. Basic definition: refers to software technology that uses multi-dimensional information to access, analyze and verify online data for specific problems
2.Basic concepts
(1) Dimension
(2) Dimension level
(3) Dimension members
(4) Measurement
(5) Multidimensional data collection
(6) Data unit
3.Technical features
(1) Rapidity
(2) Analyzability
(3) Multidimensionality
(4) Informative
Data warehouse system framework design
Data warehouse planning and preparation
1. User needs analysis
2. Feasibility analysis
technical feasibility
economic feasibility
operational feasibility
3. Construction coordination and resistance analysis
4. Formulation of project development plan
(1) "What to do"
Solve the task division of data warehouse construction
(2) "How to do it"
Task description and progress planning for data warehouse construction
(3) "What is needed"
Calling and arrangement of key resources: personnel, hardware, software
Data warehouse data architecture
1. Data flow direction
Post source layer: source system data loading
Theme layer: Through data processing, detailed historical data, customer information, account information, transaction data, etc. are stored according to themes.
Summary layer: Regularly summarize according to account information and customer information
Application layer: Finally, the data required for application analysis is formed and stored.
2.Data model
With the accumulation of data warehouse construction, it is necessary to form a mature data warehouse data model that meets the characteristics.
3. Data standards
data mapping
enforce rules
4. Data quality
(1) Definition and initial measurement
(2) Analyze and find errors
(3) Find the source of the problem
(4) Solve quality problems
(5) Monitor the improvement process
5.Data management and control
Unified data management system framework
6. Data retention policy and capacity
business analysis needs
Regulatory needs
The need to provide additional services to customers based on historical data
Multi-granularity data warehouse data organization structure
Whether the granularity is reasonable or not directly affects the amount of data stored in the data warehouse and the types of queries that the data warehouse can handle.
Granularity is a key measure of the degree of integration in a data warehouse
The larger the granularity, the lower the level of detail and the higher the degree of comprehensiveness of the data.
The smaller the granularity, the higher the level of detail of the data and the lower the level of comprehensiveness.
Data warehouse architecture
Determine basic functionality and expansion capabilities
1. Bottom-up and bottom-up are architectures
Top-down structure:
Advantages: centralization, unification and standardization
Disadvantages: It needs to be completed at one time, the cycle is long and the cost is high; there may be a risk of pushing it to reconstruction
Bottom-up structure: first build an independently developed data mart, and then build a data warehouse based on this technology
2. Pure data warehouse architecture
The structure is simple. Data obtained from the data source system are converted and loaded into the data warehouse, and then directly provided to the front-end data application through the data warehouse.
3. Pure data mart architecture
A global data warehouse does not exist. Data processing applications need to connect to one or more data marts to call data.
An intermediate form of data warehouse
4. Virtual data warehouse architecture
The unified data source connected to the data processing application is only an intermediary layer, which contains the rules and means for accessing and integrating data, and provides a virtual data warehouse view for users of the data warehouse.
Data integration only occurs when a user requests query data; implementation requirements are high
Key points of bank data warehouse construction
(1) The data warehouse system must first meet the requirements of the headquarters and local branches for data storage, query, statistics, analysis, etc.
(2) While constructing the data warehouse, it is necessary to build a unified data source and unified architecture.
Pay attention to the management and unified release of metadata
Pay attention to the construction of standardized salesman indicators with unified standards and consistent caliber
Establish a data inspection mechanism, continuously improve data quality, and strengthen data governance in all aspects
(3) Considering the continuous growth of business, the data warehouse construction plan must be scalable
(4) Banking business has extremely high availability requirements, and the business information system cannot be shut down easily.