MindMap Gallery DAMA-CDGA Data Governance Engineer-11. Data Warehousing and Business Intelligence
Data warehouse and business intelligence enable organizations to integrate data from different sources into a common data model. The integrated data can provide insights into business operations and open up new possibilities for corporate decision support and creation of organizational value.
Edited at 2024-03-05 20:28:30Avatar 3 centers on the Sully family, showcasing the internal rift caused by the sacrifice of their eldest son, and their alliance with other tribes on Pandora against the external conflict of the Ashbringers, who adhere to the philosophy of fire and are allied with humans. It explores the grand themes of family, faith, and survival.
This article discusses the Easter eggs and homages in Zootopia 2 that you may have discovered. The main content includes: character and archetype Easter eggs, cinematic universe crossover Easter eggs, animal ecology and behavior references, symbol and metaphor Easter eggs, social satire and brand allusions, and emotional storylines and sequel foreshadowing.
[Zootopia Character Relationship Chart] The idealistic rabbit police officer Judy and the cynical fox conman Nick form a charmingly contrasting duo, rising from street hustlers to become Zootopia police officers!
Avatar 3 centers on the Sully family, showcasing the internal rift caused by the sacrifice of their eldest son, and their alliance with other tribes on Pandora against the external conflict of the Ashbringers, who adhere to the philosophy of fire and are allied with humans. It explores the grand themes of family, faith, and survival.
This article discusses the Easter eggs and homages in Zootopia 2 that you may have discovered. The main content includes: character and archetype Easter eggs, cinematic universe crossover Easter eggs, animal ecology and behavior references, symbol and metaphor Easter eggs, social satire and brand allusions, and emotional storylines and sequel foreshadowing.
[Zootopia Character Relationship Chart] The idealistic rabbit police officer Judy and the cynical fox conman Nick form a charmingly contrasting duo, rising from street hustlers to become Zootopia police officers!
11. Data warehousing and business intelligence
introduction
data warehouse technology
Empower organizations to integrate data from different sources into a common data model. The integrated data can provide insights into business operations and open up new possibilities for corporate decision support and creation of organizational value.
Is it also a means to reduce the number of enterprises building a large number of decision support systems?
Provides a method to reduce data redundancy, improve information consistency, and enable enterprises to use data to make better decisions
business drivers
The main drivers for data warehouse construction are operational support functions, compliance needs and business intelligence activities
Business intelligence can provide insights into organizations, customers, and products
Organizations that can gain decision-making knowledge and take action through business intelligence can improve their operational efficiency and enhance their competitive advantage.
Business intelligence evolves from retrospective evaluation to predictive analytics
Target
Support business intelligence activities
Empower business analysis and efficient decision-making
Find innovative methods based on data insights
in principle
Focus on business goals
Ensure the data warehouse is used for the organization's highest priorities and solve business problems
Begin with the end in mind
Let business priorities and ultimately delivered data scope drive the creation of data warehouse content
Think globally, act locally
Let the final vision guide the architecture and quickly iterate through focused projects to build incremental delivery, resulting in a more immediate return on investment.
Summary and continuous optimization
Based on the original data, summarize and aggregate to meet needs and ensure performance, but do not replace detailed data
Improve transparency and autonomous services
The richer the contextual information, the more data consumers can get from the data.
Expose integrated data and its process information to stakeholders
Establishing metadata with the data warehouse
The key to a successful data warehouse is the ability to accurately interpret data
Collaboration
Collaborate with other data activities, especially data governance, data quality and metadata management activities
Don't be the same
The right tools and products for every data consumer
basic concept
Business Intelligence
first meaning
Refers to a data analysis activity to understand organizational demands and find opportunities
Second meaning
Refers to the collection of technologies that support this type of data analysis activity
database
Two important components
An integrated decision support database
Related software programs used to collect, clean, transform, and store data from a variety of operations and external sources
A data mart is a copy of a subset of the data in a data warehouse
Data warehouse includes data storage or operational extraction that provides data to support the achievement of any business intelligence goals
Data warehouse construction
Refers to the operation process of extracting, cleaning, converting, controlling, and loading data in the data warehouse.
The focus of the data warehouse construction process is to implement an integrated, historical business environment on operational data by enforcing business rules and maintaining appropriate business data relationships.
Traditional data warehouse construction
Mainly focused on structured data
Modern business intelligence and data warehousing
Contains semi-structured and unstructured data
unstructured data
refers to data that cannot be predefined through the data model
Various forms
Exists in emails, free-form text, videos, web pages, photos
How to build a data warehouse
One is Bill Inmon
Inmon defines a data warehouse as "a subject-oriented, integrated, time-changing, summary and detailed, relatively stable historical data collection that supports management decision-making."
Use a normalized relational model to store and manage data
Not affected by data volume
One is Kimball
Kimball defines a data warehouse as “a copy of transactional data customized for query and analysis”
multidimensional data model
Not suitable for large amounts of data, which may cause the data to run immobile.
common ground
Data warehouse stores data from other systems
Storage activities include integrating data in a way that increases the value of the data
Data warehouses are easily accessible and analyzed
Organizations build data warehouses because they need to give authorized stakeholders access to reliable, integrated data
There are many purposes for building a data warehouse, covering workflow support, operations management and predictive analysis.
focus
It’s BI, but you can also do AI
Enterprise information factory
The difference between data warehouse and business system
Topic oriented
Data warehouses are organized based on major business entities without focusing on functionality or application
integrated
The data in the data warehouse is unified and cohesive
Maintaining the same key structures, encoding and decoding of structures, data definitions, and naming conventions are consistent throughout the data warehouse
Because the data is integrated, the data warehouse is not simply a copy of operational data
Instead, the data warehouse becomes a system of data recording
changing over time
The data warehouse stores data for a certain period of time
The data in the data warehouse is like a snapshot. Each snapshot reflects the data status at a certain point in time.
This means that a data query based on a certain time period will always get the same results, no matter when the query is performed
stable
In a data warehouse, data records are not updated as frequently as in business systems.
Instead, new data will only be appended to old data.
A set of records can represent different states of the same transaction
Aggregated and detailed data
The data in the data warehouse includes atomic transaction details and aggregated data.
Business systems rarely aggregate data
In a data warehouse, summary data can be persisted in a table or non-persistent and displayed in the form of a view.
historical
The focus of the business system is current data
Data warehouse also includes historical data, which usually consumes a lot of storage space
CIF components
app
Application handles business processes
Detailed data generated by applications flows to data warehouses and operational data stores where it can be used for analysis
Data cache area
A database between the business system source database and the target data warehouse
The temporary storage area is used for data extraction, conversion, and loading, and is transparent to end users.
Most of the data in the staging area is saved temporarily, and usually only a relatively small part of the data is persistent data.
Integrate and transform
At the integration layer, data from different data sources are converted and integrated into standard enterprise models in data warehouses and ODS.
Operational Data Store (ODS)
Operational data storage is an integrated database for business data
The data may come from the application system or other databases
Operational data stores usually include current or recent (30 to 90 days) data, while data warehouses also include historical (usually many years) data.
The data in operational data storage changes rapidly, while the data in data warehouse is relatively stable.
Not all organizations will build operational data storage. The existence of operational data storage meets the needs of enterprises for low-latency data.
Operational data storage can be used as the main source of the data warehouse and can also be used for auditing the data warehouse.
data mart
Data mart provides the basis for subsequent data analysis
The data in question is usually a subset of the data warehouse used to support a specific analysis or a specific kind of consumer.
Operational Data Mart (OpDM)
Operational data marts are data marts focused on operational decision support
It fetches data directly from operational data stores rather than from data warehouses
Has the same characteristics as operational data storage
Contains current or recent data that changes frequently
database
Data warehouse provides a unified and integrated entrance for enterprise data to support management decision-making, strategic analysis and planning
Data flows from application and operational data stores into the data warehouse and then into the data mart, often in one direction.
Data that requires correction (does not meet requirements) will be denied entry
Ideally the correction is done on its source system and then reloaded through the ETL process system
Operational reports
Operational reports are output from the data store
Reference data, master data and external data
Data in data warehouses and data marts is different from data in applications
Data is organized by subject area rather than by functional need
Data is integrated data, not siled siled data
The data is a series over time, not just current values
Data has higher latency in the data warehouse than in the application
More historical data is available in the data warehouse than in the application
multidimensional data warehouse
It is not organized by the normalization requirements of the relational model
A multidimensional model, often called a star schema, consists of fact tables and dimension tables
The fact table is related to many dimension tables, and the whole picture looks like stars.
Multiple fact tables will share common dimensions or consistent dimensions through a "bus", similar to the bus in a computer
Multiple data marts can be integrated into an enterprise-level data mart by inserting buses that follow dimensions.
Kimball's database is more scalable than Inmon's database
components
business source system
data storage area
Data display area
Data access tools
Data warehouse architecture components
process
Flows from source systems to data stores where data can be cleaned and enriched when integrated and stored in a data warehouse or operational data store
In a data warehouse, data can be accessed through data marts or data cubes to generate various reports.
Big data process differences
While most warehouses integrate data before putting it into reports, big data solutions load the data in before integrating it
In addition to various traditional types of reports, big data business intelligence may also include predictive analysis and data mining.
source system
Includes business systems and external data to flow into the data warehouse/business intelligence environment
Including customer relationship management system, financial system, human resources system, external DaaS services, etc.
Data warehouse must have
data integration
Includes extraction, transformation and loading, data virtualization, and other techniques to convert data to a location in a common format
Data warehouse must have
data storage area
storage cache
The staging area is an intermediate data storage area between the data source and the centralized data repository
This is where the data resides briefly so that it can be transformed, integrated, and prepared for loading into the data warehouse
Reference data and master data consistency dimensions
Reference data and master data can be stored in separate repositories
The data warehouse provides data for the main data, and this separate repository provides the same dimensional data for the data warehouse
central data warehouse
Data warehouse must have
After the transformation and preparation process is completed, the data in the data warehouse is usually retained in a central or atomic layer
This layer retains all historical atomic data as well as the latest instantiated data after the batch run
Operational data storage ODS
Can support lower latency, so it can support business applications
Because operational data stores contain a time window of data rather than the entire history, they can be refreshed at a faster rate than a data warehouse.
data mart
It is usually used as the presentation layer of the data warehouse environment, and is also used to present department-level or functional-level subsets of the data warehouse for reporting, querying and analyzing historical information.
Data marts target a specific subject area, a single department, or a single business process
It can also be the basis for a virtualized data warehouse, with the combined data marts forming the final data warehouse entity.
data cube
There are three classic methods for implementing systems that support online analysis and processing
Based on relational database
Based on multidimensional database
hybrid database
How to load
There are two main types of data integration processing involved
historical data
Usually only needs to be loaded once, or a limited number of times to handle data issues, and then never loaded again.
Continuous data updates
Consistent planning and execution is required to ensure that the data warehouse contains the latest data
historical data
One advantage of a data warehouse is that it captures a detailed history of the data stored
Batch change data capture
Usually, the data warehouse performs a data loading service through a batch processing window every night.
Timestamp or log table loading are the most common techniques
Full load is used when dealing with legacy systems without native timestamping capabilities or certain batch recovery conditions.
Near real-time and real-time data loading
The emergence of operational business intelligence (or operational analytics) drives the need for lower latency, integrating more real-time or near-real-time data into data warehouses, and new architectural approaches emerge for handling data that is prone to change.
Alternatives to batch processing address the increasingly shorter data availability latency requirements in data warehouses. There are three main alternatives: trickle loading, messaging, and streaming. They have different positions where data accumulates while waiting to be processed.
Trickle transmission (source accumulation)
Unlike nightly window batch loading, trickle loading loads at a more frequent pace or in a threshold manner
This method allows you to do some batch processing operations during the day without having to concentrate on a dedicated batch processing window at night.
Message transfer (bus accumulation)
Real-time or near real-time message interaction is very useful when very small datagrams are published to the message bus.
Source and target systems are independent of each other
Often used in DaaS applications
Streaming (destination accumulation)
Unlike timing or threshold loading at the source, the target system collects data in a buffer or queue and processes it sequentially.
Activity
Understand the needs
Define and maintain data warehouse/business intelligence architecture
Determine data warehouse/business intelligence technology architecture
Identify data warehouse/business intelligence management processes
Develop data warehouses and data marts
Generally speaking, data warehouse/business intelligence construction projects have three coexisting construction tracks.
data
Data necessary to support business analysis
This track involves identifying the best sources of data and designing rules for how to modify, transform, integrate, store and make the data available to applications.
technology
Backend systems and processes to support data storage and migration
business intelligence tools
The suite of applications necessary for data consumers to derive meaningful data insights from deployed data products
Map source to target
Correct and transform data
Load data warehouse
Implement a business intelligence portfolio
Group users as needed
Match tools to user requirements
Maintain data products
Overview
A well-built data warehouse and customer-facing business intelligence tool is a data product
Enhancements (extensions, additions or modifications) to existing data warehouse platforms should be implemented incrementally
Maintaining the scope of an increment and the critical path for executing focused work items can be a challenge in an ever-changing work environment
Prioritization should be established jointly with business partners and focus on efforts that must be enhanced
Release management
Release management is critical to codifying the development process, adding new features, enhancing production deployments, and ensuring regular maintenance is provided for deployed assets
This process will keep the data warehouse up to date, clean, and running optimally
This is a continuous improvement work
Managing the data product development life cycle
Each iteration will expand on existing increments or incorporate new features proposed by the business team
Monitor and tune the loading process
Monitor load processing across the system and understand performance bottlenecks and performance dependency paths
Use database tuning techniques where and when you need them, including partitioning, backup tuning, and recovery strategy adjustments
Monitor and tune business intelligence activities and performance
Transparency and visibility are key principles driving data warehousing, business intelligence monitoring
The more details of data warehouse and business intelligence activities are made public, the more data consumers can see and understand what is going on, and the less direct support is needed for end customers.
tool
metadata repository
Data dictionary and terminology
Data dictionary is a necessary component to support the use of data warehouse
Dictionaries describe data in business terms, including other information needed to use the data
Typically, the contents of the data dictionary come directly from the logical data model
The kinship between data and data models
Recorded data kinship has multiple uses
Investigate the root cause of data issues
Perform impact analysis on system changes and data issues
Determine the reliability of data based on its source
Data integration tools
Data integration tools are used to load the data warehouse
When selecting tools, you also need to consider the following functions of system management:
Process auditing, control, restart and scheduling
The ability to selectively extract data elements at execution time and audit them before passing them to downstream systems
Control which operations can or cannot be performed, and which failed or aborted processes can be restarted
Types of Business Intelligence Tools
Overview
Operational reports
Is the application of business intelligence tools to analyze short-term (monthly) and long-term (annual) business trends
Operational reporting can also help uncover trends and patterns, using tactical business intelligence tools to support short-term business decisions
business performance management
Includes a formal assessment of indicators of alignment with organizational goals, typically at the executive level
Descriptive self-service analysis
Provides business intelligence tools to the front office with analytical capabilities to guide operational decisions
Operational reports
Operational reporting refers to business users generating reports directly from trading systems, applications or data warehouses
This is usually a function of an application
Usually ad hoc queries are used when the report is just a simple report or used to start a workflow
business performance management
Performance management is an integrated set of organizational processes and applications designed to optimize the execution of business strategy
Another professional management method has been formed in this field: creating scorecards in the form of dashboards and dashboards to allow users to maintain consistent information interaction between management and execution.
Operational analytics applications
Overview
Online Analytical Processing (OLAP) is a method that provides fast performance for multidimensional analytical queries
The term OLAP is derived in part from the clear distinction between OLTP online transaction processing
Common OLAP operations include
slice
A slice is a subset of a multidimensional array that corresponds to a single value for one or more members of a dimension that is not in the subset.
Cut into pieces
Tiles are slices of more than two dimensions on the data cube, or more than two consecutive slices
Drill up/down
is a specific analysis technique that allows users to navigate between different levels of data
Ranges from the most general (up) to the most detailed (down)
Upconvolution
Convolution involves calculating all data relations in one or more dimensions
perspective
Perspectives change the display dimensions of a report or page
OLAP implementation method
Relational Online Analytical Processing (ROLAP)
ROLAP supports OLAP by using multidimensional technology in two-dimensional tables of relational database RDBMS
Star schema is a commonly used database design technique in ROLAP environments
Multidimensional Matrix Online Analytical Processing (MOLAP)
MOLAP supports OLAP through the use of specialized multidimensional database technology
Hybrid Online Analytical Processing (HOLAP)
It is a combination of ROLAP and MOLAP
The HOLAP implementation allows part of the data to be stored in MOLAP and part to be stored in ROLAP
method
Prototypes that drive requirements
Profiling the data will aid in prototyping and reduce risks associated with unexpected data
Self-service business intelligence
Basic delivery methods for self-service business intelligence products
It typically places user activities in a managed portal, providing various features based on the user's permissions
Including messaging, alarming, viewing scheduled production reports, dashboards, scorecards, etc.
Reports can be pushed to the portal on a standard schedule for users to retrieve at their leisure
Users can also extract data by executing reports in portals that share content across organizational boundaries
Queryable audit data
To maintain data lineage, all structures and processes should be able to create and store audit information and enable fine-grained tracking and reporting
Implementation Guide
Readiness Assessment/Risk Assessment
The data warehouse should be able to achieve the following points
Clarify data sensitivity and security constraints
Select tool
Ensure resource security
Create an extraction process to evaluate and receive source data
Version roadmap
Configuration management
Organizational and cultural change
Data Warehousing/Business Intelligence Governance
business acceptance
There are some very important architectural subcomponents and their supporting activities that need to be considered up front
conceptual data model
Data quality feedback loop
End-to-end metadata
End-to-end verifiable data lineage
Customer/User Satisfaction
service level agreement
Reporting strategy
A reporting strategy includes standards, processes, guidelines, best practices and procedures that will ensure users receive clear, accurate and timely information
Reporting strategies should address the following issues
secure access
Ensure only authorized users have access to sensitive data
Describes access mechanisms for user interaction, reporting, inspection, or viewing of other data
Types of user communities and appropriate tools for using it
Report summary, details, exceptions, and frequency, timing, distribution, and stored nature
Unleash the potential of visualization capabilities with graphical output
Trade-off between timeliness and performance
center of excellence
Centers of Excellence can provide training, launch setup, design best practices, data source tips and tricks, and other solutions or capabilities to help business users implement a self-service model
In addition to knowledge management, the center provides timely communication for developers, designers, analysts and subscriber organizations
Metrics
Use indicators
Metrics used in data warehouses include number of registered users, number of connected users, or number of concurrent users
These metrics represent how many people in the organization are using the data warehouse
Subject area coverage
Subject area coverage percentage measures the extent to which each department has access to the warehouse and also highlights which data is shared across departments and which is not yet but might be.
Response time and performance metrics
Most query tools measure response time
Retrieve response or performance metrics from tools