Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

MindMap Gallery DAMA-CDGA Data Governance Engineer-5. Modeling and Design

DAMA-CDGA Data Governance Engineer-5. Modeling and Design

Data modeling is the process of discovering, analyzing, and determining data requirements, expressing and communicating those data requirements in a precise form called a data model. Good data modeling reduces the cost of building new applications by reducing support costs and increasing the likelihood of reuse for future requirements.

Edited at 2024-03-05 20:20:03

PlotWizard

Recent works View more works>>

DAMA-CDGA Data Governance Engineer-5. Modeling and Design

PlotWizard

Recent works View more works>>

Recommended to you
Outline

DAMA-CDGA Data Governance Engineer-17. Data Management and Organizational Change Management
- 32
PlotWizard
DAMA-CDGA Data Governance Engineer-16. Data Management Organization and Role Expectations
- 55
PlotWizard
DAMA-CDGA Data Governance Engineer-15. Data Management Maturity Assessment
- 56
PlotWizard
DAMA-CDGA Data Governance Engineer-14. Big Data and Data Science
- 46
PlotWizard
DAMA-CDGA Data Governance Engineer-13. Data Quality
- 49
- 1
PlotWizard
DAMA-CDGA Data Governance Engineer-12. Metadata Management
- 42
PlotWizard
DAMA-CDGA Data Governance Engineer-11. Data Warehousing and Business Intelligence
- 41
PlotWizard
DAMA-CDGA Data Governance Engineer-10. Reference Data and Master Data
- 57
PlotWizard
DAMA-CDGA Data Governance Engineer-9. Document and Content Management
- 36
PlotWizard
DAMA-CDGA Data Governance Engineer-8. Data Integration and Interoperability
- 32
PlotWizard

5. Modeling and Design

introduction

definition

Data modeling is the process of discovering, analyzing, and determining data requirements, expressing and communicating those data requirements in a precise form called a data model

significance

Modeling→Building tables→Storing data

This is important because it affects storage cost and performance

Classification

Modeling method

relationship model

Multidimensional model

object-oriented model

fact pattern

time series pattern

NoSQL mode

Sort by level of detail

conceptual model

logic model

1. It is the product of data architecture; 2. It should be done by business departments and architects

physical model

1. It is the product of data modeling; 2. It should be done by the technical department and DBA

Each model consists of a set of components such as entities, relationships, facts, keys, and properties

business drivers

Provide a common vocabulary about data

Obtain and record detailed information about data and systems within the organization

Serve as the main communication tool in the project

Provides a starting point for application customization, integration and even replacement

goals and principles

Good data modeling reduces the cost of building new applications by reducing support costs and increasing the likelihood of reuse for future needs.

Data model is an important form of metadata

Identifying and documenting understanding from different perspectives can help

format

A data model is a concise definition of data structures and data relationships.

Formatted definitions give data a standardized structure, reducing the probability of exceptions when accessing and saving data.

Define scope

Data models help understand data context boundaries

Knowledge retention records

Data models can preserve enterprise information for a system or project by capturing knowledge in written form

basic concept

Data modeling and data models

Modeling

The direct result of data modeling is not the database, but the understanding of organizational data.

Model

It is a representation of things in reality or a pattern of trying to create things.

A model can have one or more charts

Model diagrams can be quickly understood using standardized symbols.

A data model describes the data that an organization already understands or will need in the future

A model is a form of documentation used to record data requirements and data definitions generated during the data modeling process.

Modeled data types

The following 4 main types can be type of data to model

Category information

Data used to classify things and assign things to types

Such as customers classified by market category or business segment; products classified by color, model, size

Resource information

Basic data on resources required to implement operational processes

such as products, customers, suppliers, facilities, organizations, and accounts

Business event information

Data created during operations

Customer orders, supplier invoices, cash withdrawals, business meetings

Detailed transaction information

Detailed transaction information is generated through the sales system

1. It is static data; 2. Some "dynamic data" can also be modeled, such as system plans;

Data model components

entity

definition

is a thing that is different from other things

It is the carrier through which an organization collects information.

can be viewed as answers to some basic questions

what, why, who, when, where, how

Alias

common usage

entity

is an entity type

Entity instance

Is the reification or value of a specific entity

1. Employee is an entity; 2. Jane is an entity instance

Differentiate according to model

relational model

entity

Dimensional model

Fact tables, dimensions

object-oriented

class, object

time based

center, satellite, link

NoSQL

File, node

Distinguished by model abstraction level

conceptual model

concepts, terminology

logic model

entity

physical model

surface

Graphical representation of entities

Entities are usually represented by a rectangle (or a rectangle with rounded edges), with the name of the entity in the middle of the rectangle

Define standards

Belongs to core metadata

High-quality data-defining features

clear

precise

whole

relation

definition

is the relationship between entities

Alias

Differentiate according to model

relational model

relation

Dimensional model

Navigation path

NoSQL

borders, links

Distinguished by model abstraction level

concepts, logical models

relation

physical model

constraints, references

Graphical representation of relationships

Typically shown as lines on data modeling

cardinality of relationship

Cardinality describes the number of relationships an entity has with other entities involved in establishing relationships.

Cardinality is represented by the symbols that appear at both ends of the relationship line

For basics, you can only choose 0, 1 or more

Each side of the relationship can have any combination of 0, 1 or more

Specify 0 or 1 to indicate whether an entity instance is required in the relationship

1 or more entities representing the number of parties in a given relationship

relation

Unary relationship

Also known as recursive relationship or self-referential relationship

Contains only one entity

In a hierarchical relationship, an entity has at most one parent entity

In a relationship network, an entity can have multiple parent entities

Binary relationship

involves two entities

ternary relationship

three entities involved

foreign key

ForeignKey is usually expressed in the physical data model and is sometimes used in the logical data model.

Attributes

definition

An attribute is a property that defines, describes, or measures some aspect of an entity

An entity is physically represented as a column, field, tag, or node in a table, view, document, graph, or file

Graphical representation of properties

In a data model, attributes are typically described in lists within an entity matrix

identifier

definition

An identifier, also called a key, is a collection of one or more attributes that uniquely identify an entity instance.

key structure type

single key

Is an attribute that uniquely identifies an entity instance

Such as VIN

surrogate key

Also a single key

The surrogate key is a unique identifier for the table and is automatically generated by the system.

The surrogate key is an integer and its meaning has nothing to do with the numerical value

Surrogate keys have a technical function and should not be visible to end users of the database

It is saved in the background to help maintain uniqueness, allow more efficient navigation between structures, and facilitate integration across applications

composite key

Is a collection of two or more attributes that together uniquely identify an entity instance

Such as phone number (area code, exchange number, local number)

Mixed keys

Contains a composite key and at least one other single key, composite key, or non-key attribute

Like a key on a multidimensional fact table, it may contain several composite keys, a single key, and an optional load timestamp

Key function type

super key

is any set of attributes that uniquely identifies an entity instance

candidate key

Is the smallest set of attributes that identifies an entity instance, which may contain one or more attributes

Minimal means that no subset of candidate keys uniquely identifies the entity instance

An entity may have multiple candidate keys

Candidate keys are business keys

business key

Is used by business professionals to retrieve one or more attributes of a single entity instance

Mutually exclusive with surrogate keys

primary key

is the candidate key chosen as the entity's unique identifier

An entity may have multiple candidate keys, but only one candidate key can be used as the primary key of an entity

The primary key is a surrogate key

Alternate keys

is a candidate key

Although it is also unique, it is not selected as the primary key.

Used to find specific entity instances

Identification and non-identification relationships

entity

independent entity

means that its primary key only contains attributes that belong only to this entity

Relational data model: using rectangular symbols to represent independent entities

dependent entity

means that its primary key contains at least one attribute from other entities

Relational data model: using rounded rectangle symbols to represent dependent entities

identification relationship

It means that the primary key of the parent entity is inherited as a foreign key as part of the primary key of the child entity.

Dependent entities contain at least one identifying relationship

For example, the relationship between students and registrations, courses and registrations

non-identifying relationship

The primary key of the parent entity is only inherited as the non-primary foreign key attribute of the child entity

area

definition

Represents all possible values that can be assigned to an attribute

All values in the domain are valid values, values that are not in the domain are called invalid values

Attributes should not contain values outside the specified domain

Domains can be restricted with additional rules, called constraints.

Domains can be defined in many different ways

type of data

Data Format

list

scope

rule based

Data modeling methods

relational model

The purpose of the design is to accurately express the business and eliminate redundancy

It is particularly suitable for designing operational systems because such systems require rapid input of information and accurate storage of information.

Dimensional model

fact table

for specialized numeric measures

The fact table takes up 90% of the database space

dimension table

Represents important business objects and mainly contains text descriptions

Dimensions are entry points or links to fact tables and serve as the main source of "query" and "report" constraints.

Often highly anti-paradigm

Usually 10% of the total

Variety

Dimensions also have properties that change at different rates

Gradient class dimension SCD

ORC

The first category, coverage

New value overwrites old value

The second category, new lines

New values are written in new rows, old rows are marked as non-current values

The third category, new column

Multiple instances of a value are listed in different columns on the same row, with the last value being discarded

snowflake model

Contains fact tables and dimension tables

Dimension table outreach

star schema

Contains fact tables and dimension tables

Dimension table does not expand

granularity

It refers to the meaning or description of a single row of data in the fact table. This is the most detailed information that each row has.

Defining the granularity in a fact table is one of the key steps in dimensional modeling

The finer the granularity, the greater the flexibility

consistency dimension

Considered based on the organization as a whole rather than on a specific project

These dimensions are shared across different dimensional models due to consistent terms and values.

consistency fact

Use standardized terminology across multiple data marts

Different business users may use the same term in different ways

Many things have the same name, but the concepts in each organization are different. Or conversely, things have different names, but they actually express the same concept in each organization.

object-oriented

UML

time based

data vault

anchor modeling

based on facts

Object Role Modeling

Complete communication-oriented modeling

non-relational

document database

key-value database

column database

graph database

Levels of data modeling

conceptual data model

The conceptual data model only includes basic and key business entities in a given domain and function, and also gives a description of the entities and the relationships between them.

logical data model

A logical data model is a detailed description of data requirements

The logical data model is not subject to any technical or specific implementation conditions

Usually extended from the conceptual data model

In the relational logical data model, extend the conceptual data model by adding attributes

physical data model

entity

Attributes

rule

no spaces

The length cannot exceed 30Bytes

It cannot start with a number, it must start with a letter

physical data model

Describes a detailed technical solution, usually based on a logical data model, that matches a class of system hardware, software, and network tools

Physical data models are technology-specific

Because physical data models are subject to implementation technology constraints, retrieval performance is often improved by combining structures (denormalization)

canonical model

A canonical model is a type of physical model used to describe the movement of data between systems

The model describes the structure of data passed between systems as datagrams or messages

When sending data via a Web service, Enterprise Service Bus ESB, or Enterprise Application Integration EAI, the specification model describes the data structures that the sending service and the receiving service should use.

These structures should be designed to be as generic as possible to enable reuse and simplify interface requirements

view

is a virtual table that provides a way to view data from a variety of tables that contain or reference actual attributes

When a property in the view is requested, standard views run SQL to retrieve the data

Partition

refers to the process of splitting a table

To facilitate archiving and improve retrieval performance when performing partitioning

Partition

Can be vertical (grouped by column)

To reduce the result set returned by a query, subsets can be created for a table based on different columns.

For example, split the customer table into two tables, one based on static fields or relatively volatile fields (to improve loading and retrieval performance)

Or based on common and non-common fields in queries (to improve full table scan performance)

Can also be horizontal (grouped by rows)

To reduce the result set returned by the query, create a subset table using the value of a certain column as a distinction

For example, create a regional customers table that contains only customers in a specific region

denormalization

Brief description

Purpose: Sacrifice space (data overlap, improve performance)

Only exists in physical models

Suitable for OLAP

It is to convert the logical data model that conforms to the paradigm rules into some physical tables with redundant data after careful consideration.

In other words, denormalization intentionally places an attribute in multiple places

There are many reasons to denormalize your data, the most important being to provide performance

Combine data from multiple other tables ahead of time to avoid costly runtime joins

Create smaller, pre-filtered copies of data to reduce expensive runtime calculations or scans of large tables

Precompute and store the results of expensive data calculations to avoid runtime system resource contention

Denormalization introduces the risk of data errors due to data redundancy

Therefore, denormalization will be chosen only when the physical design using views or partitions still cannot meet efficiency requirements.

Dimensional model

Denormalization is called "folding" or "merging"

If each dimension is collapsed into a structure, the resulting data model is called a star schema

If the dimensions are not collapsed, the resulting data model is called a snowflake

Standardize

Normalization is the process of using rules to transform complex business into standardized data structures.

The goal is to ensure that each attribute appears in only one location to eliminate redundancy or inconsistencies caused by redundancy

Organize attributes according to primary and foreign keys

Normalization rules can be classified into different normalization levels, and a more granular approach and normalization can be applied to each level to search for the correct primary key.

Each level consists of an independent paradigm, and each successive level need not contain the previous level

level

first normal form

Ensure that each entity has a valid primary key, each attribute depends on the primary key, and eliminate redundant grouping to ensure that each attribute is atomic (more than one value cannot exist)

second normal form

Make sure every entity has a minimal primary key and every attribute depends on the full primary key

third normal form

Ensure that each entity does not have a hidden primary key and that each attribute does not depend on any attribute other than the key value

The standardization of the model usually requires reaching the third normal form level.

fourth paradigm

fifth paradigm

abstraction

is the removal of details so as to extend applicability to a wider range of situations while retaining the important and essential attributes of a concept or topic

Activity

Planning data modeling

Deliverables

chart

A data model contains multiple diagrams. Diagrams are a form of describing requirements in a precise way.

Requirements can describe the level of detail (conceptual, logical, physical), the data model used (relational, dimension, fact, time, object, NoSQL), and the representation method used in the instance (Unified Modeling Language, Object Role Modeling mold)

definition

The definition of entities, relationships, and attributes is crucial to maintaining the accuracy of the data model

Controversies and unresolved issues

Typically the documentation delivered by a data modeling effort should include current topics and open issues

blood relation

Blood relationship refers to the contextual relationship between where the data comes from, what kind of processing it undergoes, and what kind of results it becomes.

1. It helps data modelers understand data requirements and accurately locate attribute sources.

2. Determine the status of attributes in the source system. This is an effective tool to verify the accuracy of models and mapping relationships.

Build data model

forward engineering

definition

Refers to the process of building a new application starting from requirements

order

conceptual model

Understand the scope of requirements and core terminology by building a conceptual model

logic model

Then build a logical model to describe the business process in detail

physical model

Finally, the physical model is implemented through specific table creation statements.

Conceptual data model modeling

Select model type

Choose from relationship, dimension, fact, time, object, NoSQL modeling methods to model

Select representation method

Once the model type of modeling is selected, the next step is to consider which modeling representation method to use, such as information engineering method IE, object role modeling ORM

Complete initial conceptual model

Collect the most advanced concepts in the organization

Collect activities related to these concepts

Consolidated corporate terminology

Get signed

Logical data model modeling

The logical data model complements the requirements details in the conceptual model

step

Analyze information needs

Analyze existing documents

Add related entities

Association entities are used to describe many-to-many relationships

A related entity takes the identity properties from the entities involved in the relationship and puts them into a new entity

Add properties

Add attributes to conceptual entities

An attribute in a logical data model is atomic, it should contain one and only one data (fact) and cannot be split again

Specify domain

The role of the domain is to ensure the consistency of the format and value set in the model attributes.

Specify key

Properties assigned to entities can be key properties or non-key properties

The key attribute helps to identify a unique entity instance from all entity instances. It can be a single attribute as a key or a partial key combined with other key elements.

Physical data modeling

The logical data model needs to be modified and adjusted to form the physical data model and make the final design work well in the storage application

surface

document

model

List

Field

element

Example

Record

step

Solving logical abstractions

Add property details

Add reference data object

Specify surrogate key

denormalization

Denormalizing or adding redundancy can greatly improve performance, far outweighing the cost of duplicate storage and copy processing

Dimensional models mainly use denormalization

index

Indexes are another option for optimizing query (data retrieval) performance when accessing database data.

Indexes can improve query performance

Indexes can be unique or non-unique, clustered or non-clustered, partitioned or non-partitioned, single column or multiple columns, etc.

Without appropriate indexes, the DBMS will read every row in the table (table scan) to retrieve all data. For large tables, this will cost a lot of money

To try to build an index on a large table, use the columns that are referenced most frequently (especially keys including primary keys, alternate keys, and foreign keys) for the most frequently run queries.

Partition

Partitioning strategies for the entire data model (dimensions) must be fully considered, especially when facts contain many optional dimension keys

Create view

Views are used to control access to certain data elements and can also be used to embed common join conditions or filters to standardize common objects or queries.

The view itself is requirements driven

Reverse Engineering

The first step, physical data modeling

The second step, logical data modeling

The third step, conceptual data modeling

Review data model

Model quality needs to be controlled through continuous improvement practices

Maintain data model

Data models need to be kept up to date

When requirements or processes change, the data model needs to be updated.

Generally speaking, in a specific project, when the model level needs to be changed, it also means that the corresponding higher-level model needs to be changed.

For example, if a new column needs to be added to the physical model, this column often needs to be added as an attribute to the corresponding logical data model.

At the end of a development iteration, a good practice is to reverse engineer the latest physical data model and ensure that it is consistent with the corresponding logical data model

Many data modeling tools can automatically compare physical and conceptual model differences

tool

Data modeling tools

Data modeling is software that automatically implements data modeling functions

More sophisticated data modeling tools support forward engineering of transformations from conceptual to logical models, from logical models to physical models, and from physical models to database structures.

Most also support reverse engineering from database to conceptual model

Data Lineage Tool

Is a tool that allows capturing and maintaining source structure changes for each attribute on the data model

These tools enable change impact analysis

They can be used to see whether changes in one system, or changes in one part of a system, have an impact on another system

Data analysis function

Can help explore data content, validate against current metadata, identify data quality and deficiencies in existing data artifacts

metadata repository

Used to store descriptive information about the data model

Sharing metadata is more important than storing metadata

The metadata repository must have a user-friendly way for people to query the contents of the repository

data model schema

basic mode

Are the “nuts and bolts” of data modeling

Kit mode

refers to a set of building blocks that spans business people and data modelers

Integrated model

Provides a framework for integrating suite patterns in common ways

Industry data model

are pre-built data models for entire industries that are often extensive and detailed and can contain thousands of entities and attributes

Any purchased data model will need to be customized to suit the characteristics of the organization

method

Best practices for naming conventions

Naming standards are especially important for entities, tables, properties, keys, views, indexes

Names should be unique and as descriptive as possible

Naming standards should minimize name changes across environments

Names should not be influenced by their specific environment, such as test, QA or production environments

Classifier, that is, the last term in attribute names such as quantity, name, and code, which can be used to distinguish attributes of entity and column names from table names.

logical name

Use complete words wherever possible and avoid using all but the most familiar abbreviations

Normally no delimiters are allowed to separate words.

physical name

Must conform to the maximum length allowed by the DBMS, so use abbreviations where necessary

Usually use underscore as word separator

Best practices in database design

DBAs should keep PRISM design principles in mind

Performance and Ease of usePerformance and Ease of use

Reusability

Integrity

SecuritySecurity

MaintainabilityMaintainability

Data modeling and design governance

Data modeling and design quality management

Develop data modeling and design standards

Review data model and database design quality

Manage data model versions and integrations

Metrics

To what extent does the model reflect business needs?

How complete is the model?

How well does the model match the pattern?

How is the model structured?

How general is the model?

How well does the model follow naming standards?

How readable is the model?

How is the model defined?

How consistent is the model with the enterprise data architecture?

How well does the model match the metadata?