MindMap Gallery DAMA-CDGA Data Governance Engineer-5. Modeling and Design
Data modeling is the process of discovering, analyzing, and determining data requirements, expressing and communicating those data requirements in a precise form called a data model. Good data modeling reduces the cost of building new applications by reducing support costs and increasing the likelihood of reuse for future requirements.
Edited at 2024-03-05 20:20:03Avatar 3 centers on the Sully family, showcasing the internal rift caused by the sacrifice of their eldest son, and their alliance with other tribes on Pandora against the external conflict of the Ashbringers, who adhere to the philosophy of fire and are allied with humans. It explores the grand themes of family, faith, and survival.
This article discusses the Easter eggs and homages in Zootopia 2 that you may have discovered. The main content includes: character and archetype Easter eggs, cinematic universe crossover Easter eggs, animal ecology and behavior references, symbol and metaphor Easter eggs, social satire and brand allusions, and emotional storylines and sequel foreshadowing.
[Zootopia Character Relationship Chart] The idealistic rabbit police officer Judy and the cynical fox conman Nick form a charmingly contrasting duo, rising from street hustlers to become Zootopia police officers!
Avatar 3 centers on the Sully family, showcasing the internal rift caused by the sacrifice of their eldest son, and their alliance with other tribes on Pandora against the external conflict of the Ashbringers, who adhere to the philosophy of fire and are allied with humans. It explores the grand themes of family, faith, and survival.
This article discusses the Easter eggs and homages in Zootopia 2 that you may have discovered. The main content includes: character and archetype Easter eggs, cinematic universe crossover Easter eggs, animal ecology and behavior references, symbol and metaphor Easter eggs, social satire and brand allusions, and emotional storylines and sequel foreshadowing.
[Zootopia Character Relationship Chart] The idealistic rabbit police officer Judy and the cynical fox conman Nick form a charmingly contrasting duo, rising from street hustlers to become Zootopia police officers!
5. Modeling and Design
introduction
definition
Data modeling is the process of discovering, analyzing, and determining data requirements, expressing and communicating those data requirements in a precise form called a data model
significance
Modeling→Building tables→Storing data
This is important because it affects storage cost and performance
Classification
Modeling method
relationship model
Multidimensional model
object-oriented model
fact pattern
time series pattern
NoSQL mode
Sort by level of detail
conceptual model
logic model
1. It is the product of data architecture; 2. It should be done by business departments and architects
physical model
1. It is the product of data modeling; 2. It should be done by the technical department and DBA
Each model consists of a set of components such as entities, relationships, facts, keys, and properties
business drivers
Provide a common vocabulary about data
Obtain and record detailed information about data and systems within the organization
Serve as the main communication tool in the project
Provides a starting point for application customization, integration and even replacement
goals and principles
Good data modeling reduces the cost of building new applications by reducing support costs and increasing the likelihood of reuse for future needs.
Data model is an important form of metadata
Identifying and documenting understanding from different perspectives can help
format
A data model is a concise definition of data structures and data relationships.
Formatted definitions give data a standardized structure, reducing the probability of exceptions when accessing and saving data.
Define scope
Data models help understand data context boundaries
Knowledge retention records
Data models can preserve enterprise information for a system or project by capturing knowledge in written form
basic concept
Data modeling and data models
Modeling
The direct result of data modeling is not the database, but the understanding of organizational data.
Model
It is a representation of things in reality or a pattern of trying to create things.
A model can have one or more charts
Model diagrams can be quickly understood using standardized symbols.
A data model describes the data that an organization already understands or will need in the future
A model is a form of documentation used to record data requirements and data definitions generated during the data modeling process.
Modeled data types
The following 4 main types can be type of data to model
Category information
Data used to classify things and assign things to types
Such as customers classified by market category or business segment; products classified by color, model, size
Resource information
Basic data on resources required to implement operational processes
such as products, customers, suppliers, facilities, organizations, and accounts
Business event information
Data created during operations
Customer orders, supplier invoices, cash withdrawals, business meetings
Detailed transaction information
Detailed transaction information is generated through the sales system
1. It is static data; 2. Some "dynamic data" can also be modeled, such as system plans;
Data model components
entity
definition
is a thing that is different from other things
It is the carrier through which an organization collects information.
can be viewed as answers to some basic questions
what, why, who, when, where, how
Alias
common usage
entity
is an entity type
Entity instance
Is the reification or value of a specific entity
1. Employee is an entity; 2. Jane is an entity instance
Differentiate according to model
relational model
entity
Dimensional model
Fact tables, dimensions
object-oriented
class, object
time based
center, satellite, link
NoSQL
File, node
Distinguished by model abstraction level
conceptual model
concepts, terminology
logic model
entity
physical model
surface
Graphical representation of entities
Entities are usually represented by a rectangle (or a rectangle with rounded edges), with the name of the entity in the middle of the rectangle
Define standards
Belongs to core metadata
High-quality data-defining features
clear
precise
whole
relation
definition
is the relationship between entities
Alias
Differentiate according to model
relational model
relation
Dimensional model
Navigation path
NoSQL
borders, links
Distinguished by model abstraction level
concepts, logical models
relation
physical model
constraints, references
Graphical representation of relationships
Typically shown as lines on data modeling
cardinality of relationship
Cardinality describes the number of relationships an entity has with other entities involved in establishing relationships.
Cardinality is represented by the symbols that appear at both ends of the relationship line
For basics, you can only choose 0, 1 or more
Each side of the relationship can have any combination of 0, 1 or more
Specify 0 or 1 to indicate whether an entity instance is required in the relationship
1 or more entities representing the number of parties in a given relationship
relation
Unary relationship
Also known as recursive relationship or self-referential relationship
Contains only one entity
In a hierarchical relationship, an entity has at most one parent entity
In a relationship network, an entity can have multiple parent entities
Binary relationship
involves two entities
ternary relationship
three entities involved
foreign key
ForeignKey is usually expressed in the physical data model and is sometimes used in the logical data model.
Attributes
definition
An attribute is a property that defines, describes, or measures some aspect of an entity
An entity is physically represented as a column, field, tag, or node in a table, view, document, graph, or file
Graphical representation of properties
In a data model, attributes are typically described in lists within an entity matrix
identifier
definition
An identifier, also called a key, is a collection of one or more attributes that uniquely identify an entity instance.
key structure type
single key
Is an attribute that uniquely identifies an entity instance
Such as VIN
surrogate key
Also a single key
The surrogate key is a unique identifier for the table and is automatically generated by the system.
The surrogate key is an integer and its meaning has nothing to do with the numerical value
Surrogate keys have a technical function and should not be visible to end users of the database
It is saved in the background to help maintain uniqueness, allow more efficient navigation between structures, and facilitate integration across applications
composite key
Is a collection of two or more attributes that together uniquely identify an entity instance
Such as phone number (area code, exchange number, local number)
Mixed keys
Contains a composite key and at least one other single key, composite key, or non-key attribute
Like a key on a multidimensional fact table, it may contain several composite keys, a single key, and an optional load timestamp
Key function type
super key
is any set of attributes that uniquely identifies an entity instance
candidate key
Is the smallest set of attributes that identifies an entity instance, which may contain one or more attributes
Minimal means that no subset of candidate keys uniquely identifies the entity instance
An entity may have multiple candidate keys
Candidate keys are business keys
business key
Is used by business professionals to retrieve one or more attributes of a single entity instance
Mutually exclusive with surrogate keys
primary key
is the candidate key chosen as the entity's unique identifier
An entity may have multiple candidate keys, but only one candidate key can be used as the primary key of an entity
The primary key is a surrogate key
Alternate keys
is a candidate key
Although it is also unique, it is not selected as the primary key.
Used to find specific entity instances
Identification and non-identification relationships
entity
independent entity
means that its primary key only contains attributes that belong only to this entity
Relational data model: using rectangular symbols to represent independent entities
dependent entity
means that its primary key contains at least one attribute from other entities
Relational data model: using rounded rectangle symbols to represent dependent entities
identification relationship
It means that the primary key of the parent entity is inherited as a foreign key as part of the primary key of the child entity.
Dependent entities contain at least one identifying relationship
For example, the relationship between students and registrations, courses and registrations
non-identifying relationship
The primary key of the parent entity is only inherited as the non-primary foreign key attribute of the child entity
area
definition
Represents all possible values that can be assigned to an attribute
All values in the domain are valid values, values that are not in the domain are called invalid values
Attributes should not contain values outside the specified domain
Domains can be restricted with additional rules, called constraints.
Domains can be defined in many different ways
type of data
Data Format
list
scope
rule based
Data modeling methods
relational model
The purpose of the design is to accurately express the business and eliminate redundancy
It is particularly suitable for designing operational systems because such systems require rapid input of information and accurate storage of information.
Dimensional model
fact table
for specialized numeric measures
The fact table takes up 90% of the database space
dimension table
Represents important business objects and mainly contains text descriptions
Dimensions are entry points or links to fact tables and serve as the main source of "query" and "report" constraints.
Often highly anti-paradigm
Usually 10% of the total
Variety
Dimensions also have properties that change at different rates
Gradient class dimension SCD
ORC
The first category, coverage
New value overwrites old value
The second category, new lines
New values are written in new rows, old rows are marked as non-current values
The third category, new column
Multiple instances of a value are listed in different columns on the same row, with the last value being discarded
snowflake model
Contains fact tables and dimension tables
Dimension table outreach
star schema
Contains fact tables and dimension tables
Dimension table does not expand
granularity
It refers to the meaning or description of a single row of data in the fact table. This is the most detailed information that each row has.
Defining the granularity in a fact table is one of the key steps in dimensional modeling
The finer the granularity, the greater the flexibility
consistency dimension
Considered based on the organization as a whole rather than on a specific project
These dimensions are shared across different dimensional models due to consistent terms and values.
consistency fact
Use standardized terminology across multiple data marts
Different business users may use the same term in different ways
Many things have the same name, but the concepts in each organization are different. Or conversely, things have different names, but they actually express the same concept in each organization.
object-oriented
UML
time based
data vault
anchor modeling
based on facts
Object Role Modeling
Complete communication-oriented modeling
non-relational
document database
key-value database
column database
graph database
Levels of data modeling
conceptual data model
The conceptual data model only includes basic and key business entities in a given domain and function, and also gives a description of the entities and the relationships between them.
logical data model
A logical data model is a detailed description of data requirements
The logical data model is not subject to any technical or specific implementation conditions
Usually extended from the conceptual data model
In the relational logical data model, extend the conceptual data model by adding attributes
physical data model
entity
Attributes
rule
no spaces
The length cannot exceed 30Bytes
It cannot start with a number, it must start with a letter
physical data model
Describes a detailed technical solution, usually based on a logical data model, that matches a class of system hardware, software, and network tools
Physical data models are technology-specific
Because physical data models are subject to implementation technology constraints, retrieval performance is often improved by combining structures (denormalization)
canonical model
A canonical model is a type of physical model used to describe the movement of data between systems
The model describes the structure of data passed between systems as datagrams or messages
When sending data via a Web service, Enterprise Service Bus ESB, or Enterprise Application Integration EAI, the specification model describes the data structures that the sending service and the receiving service should use.
These structures should be designed to be as generic as possible to enable reuse and simplify interface requirements
view
is a virtual table that provides a way to view data from a variety of tables that contain or reference actual attributes
When a property in the view is requested, standard views run SQL to retrieve the data
Partition
refers to the process of splitting a table
To facilitate archiving and improve retrieval performance when performing partitioning
Partition
Can be vertical (grouped by column)
To reduce the result set returned by a query, subsets can be created for a table based on different columns.
For example, split the customer table into two tables, one based on static fields or relatively volatile fields (to improve loading and retrieval performance)
Or based on common and non-common fields in queries (to improve full table scan performance)
Can also be horizontal (grouped by rows)
To reduce the result set returned by the query, create a subset table using the value of a certain column as a distinction
For example, create a regional customers table that contains only customers in a specific region
denormalization
Brief description
Purpose: Sacrifice space (data overlap, improve performance)
Only exists in physical models
Suitable for OLAP
It is to convert the logical data model that conforms to the paradigm rules into some physical tables with redundant data after careful consideration.
In other words, denormalization intentionally places an attribute in multiple places
There are many reasons to denormalize your data, the most important being to provide performance
Combine data from multiple other tables ahead of time to avoid costly runtime joins
Create smaller, pre-filtered copies of data to reduce expensive runtime calculations or scans of large tables
Precompute and store the results of expensive data calculations to avoid runtime system resource contention
Denormalization introduces the risk of data errors due to data redundancy
Therefore, denormalization will be chosen only when the physical design using views or partitions still cannot meet efficiency requirements.
Dimensional model
Denormalization is called "folding" or "merging"
If each dimension is collapsed into a structure, the resulting data model is called a star schema
If the dimensions are not collapsed, the resulting data model is called a snowflake
Standardize
Normalization is the process of using rules to transform complex business into standardized data structures.
The goal is to ensure that each attribute appears in only one location to eliminate redundancy or inconsistencies caused by redundancy
Organize attributes according to primary and foreign keys
Normalization rules can be classified into different normalization levels, and a more granular approach and normalization can be applied to each level to search for the correct primary key.
Each level consists of an independent paradigm, and each successive level need not contain the previous level
level
first normal form
Ensure that each entity has a valid primary key, each attribute depends on the primary key, and eliminate redundant grouping to ensure that each attribute is atomic (more than one value cannot exist)
second normal form
Make sure every entity has a minimal primary key and every attribute depends on the full primary key
third normal form
Ensure that each entity does not have a hidden primary key and that each attribute does not depend on any attribute other than the key value
The standardization of the model usually requires reaching the third normal form level.
fourth paradigm
fifth paradigm
abstraction
is the removal of details so as to extend applicability to a wider range of situations while retaining the important and essential attributes of a concept or topic
Activity
Planning data modeling
Deliverables
chart
A data model contains multiple diagrams. Diagrams are a form of describing requirements in a precise way.
Requirements can describe the level of detail (conceptual, logical, physical), the data model used (relational, dimension, fact, time, object, NoSQL), and the representation method used in the instance (Unified Modeling Language, Object Role Modeling mold)
definition
The definition of entities, relationships, and attributes is crucial to maintaining the accuracy of the data model
Controversies and unresolved issues
Typically the documentation delivered by a data modeling effort should include current topics and open issues
blood relation
Blood relationship refers to the contextual relationship between where the data comes from, what kind of processing it undergoes, and what kind of results it becomes.
1. It helps data modelers understand data requirements and accurately locate attribute sources.
2. Determine the status of attributes in the source system. This is an effective tool to verify the accuracy of models and mapping relationships.
Build data model
forward engineering
definition
Refers to the process of building a new application starting from requirements
order
conceptual model
Understand the scope of requirements and core terminology by building a conceptual model
logic model
Then build a logical model to describe the business process in detail
physical model
Finally, the physical model is implemented through specific table creation statements.
Conceptual data model modeling
Select model type
Choose from relationship, dimension, fact, time, object, NoSQL modeling methods to model
Select representation method
Once the model type of modeling is selected, the next step is to consider which modeling representation method to use, such as information engineering method IE, object role modeling ORM
Complete initial conceptual model
Collect the most advanced concepts in the organization
Collect activities related to these concepts
Consolidated corporate terminology
Get signed
Logical data model modeling
The logical data model complements the requirements details in the conceptual model
step
Analyze information needs
Analyze existing documents
Add related entities
Association entities are used to describe many-to-many relationships
A related entity takes the identity properties from the entities involved in the relationship and puts them into a new entity
Add properties
Add attributes to conceptual entities
An attribute in a logical data model is atomic, it should contain one and only one data (fact) and cannot be split again
Specify domain
The role of the domain is to ensure the consistency of the format and value set in the model attributes.
Specify key
Properties assigned to entities can be key properties or non-key properties
The key attribute helps to identify a unique entity instance from all entity instances. It can be a single attribute as a key or a partial key combined with other key elements.
Physical data modeling
The logical data model needs to be modified and adjusted to form the physical data model and make the final design work well in the storage application
surface
surface
document
model
List
List
Field
element
OK
OK
Example
Record
step
Solving logical abstractions
Add property details
Add reference data object
Specify surrogate key
denormalization
Denormalizing or adding redundancy can greatly improve performance, far outweighing the cost of duplicate storage and copy processing
Dimensional models mainly use denormalization
index
Indexes are another option for optimizing query (data retrieval) performance when accessing database data.
Indexes can improve query performance
Indexes can be unique or non-unique, clustered or non-clustered, partitioned or non-partitioned, single column or multiple columns, etc.
Without appropriate indexes, the DBMS will read every row in the table (table scan) to retrieve all data. For large tables, this will cost a lot of money
To try to build an index on a large table, use the columns that are referenced most frequently (especially keys including primary keys, alternate keys, and foreign keys) for the most frequently run queries.
Partition
Partitioning strategies for the entire data model (dimensions) must be fully considered, especially when facts contain many optional dimension keys
Create view
Views are used to control access to certain data elements and can also be used to embed common join conditions or filters to standardize common objects or queries.
The view itself is requirements driven
Reverse Engineering
The first step, physical data modeling
The second step, logical data modeling
The third step, conceptual data modeling
Review data model
Model quality needs to be controlled through continuous improvement practices
Maintain data model
Data models need to be kept up to date
When requirements or processes change, the data model needs to be updated.
Generally speaking, in a specific project, when the model level needs to be changed, it also means that the corresponding higher-level model needs to be changed.
For example, if a new column needs to be added to the physical model, this column often needs to be added as an attribute to the corresponding logical data model.
At the end of a development iteration, a good practice is to reverse engineer the latest physical data model and ensure that it is consistent with the corresponding logical data model
Many data modeling tools can automatically compare physical and conceptual model differences
tool
Data modeling tools
Data modeling is software that automatically implements data modeling functions
More sophisticated data modeling tools support forward engineering of transformations from conceptual to logical models, from logical models to physical models, and from physical models to database structures.
Most also support reverse engineering from database to conceptual model
Data Lineage Tool
Is a tool that allows capturing and maintaining source structure changes for each attribute on the data model
These tools enable change impact analysis
They can be used to see whether changes in one system, or changes in one part of a system, have an impact on another system
Data analysis function
Can help explore data content, validate against current metadata, identify data quality and deficiencies in existing data artifacts
metadata repository
Used to store descriptive information about the data model
Sharing metadata is more important than storing metadata
The metadata repository must have a user-friendly way for people to query the contents of the repository
data model schema
basic mode
Are the “nuts and bolts” of data modeling
Kit mode
refers to a set of building blocks that spans business people and data modelers
Integrated model
Provides a framework for integrating suite patterns in common ways
Industry data model
Industry data model
are pre-built data models for entire industries that are often extensive and detailed and can contain thousands of entities and attributes
Any purchased data model will need to be customized to suit the characteristics of the organization
method
Best practices for naming conventions
Naming standards are especially important for entities, tables, properties, keys, views, indexes
Names should be unique and as descriptive as possible
Naming standards should minimize name changes across environments
Names should not be influenced by their specific environment, such as test, QA or production environments
Classifier, that is, the last term in attribute names such as quantity, name, and code, which can be used to distinguish attributes of entity and column names from table names.
logical name
Use complete words wherever possible and avoid using all but the most familiar abbreviations
Normally no delimiters are allowed to separate words.
physical name
Must conform to the maximum length allowed by the DBMS, so use abbreviations where necessary
Usually use underscore as word separator
Best practices in database design
DBAs should keep PRISM design principles in mind
Performance and Ease of usePerformance and Ease of use
Reusability
Integrity
SecuritySecurity
MaintainabilityMaintainability
Data modeling and design governance
Data modeling and design quality management
Develop data modeling and design standards
Review data model and database design quality
Manage data model versions and integrations
Metrics
To what extent does the model reflect business needs?
How complete is the model?
How well does the model match the pattern?
How is the model structured?
How general is the model?
How well does the model follow naming standards?
How readable is the model?
How is the model defined?
How consistent is the model with the enterprise data architecture?
How well does the model match the metadata?