MindMap Gallery DAMA-CDGA Data Governance Engineer-6. Data Storage and Operations
Data storage includes the design, implementation and support of stored data to maximize the value of data resources throughout the entire life cycle from data creation/acquisition to disposal. Data storage and operation represent the highly technical side of data management.
Edited at 2024-03-05 20:21:26Avatar 3 centers on the Sully family, showcasing the internal rift caused by the sacrifice of their eldest son, and their alliance with other tribes on Pandora against the external conflict of the Ashbringers, who adhere to the philosophy of fire and are allied with humans. It explores the grand themes of family, faith, and survival.
This article discusses the Easter eggs and homages in Zootopia 2 that you may have discovered. The main content includes: character and archetype Easter eggs, cinematic universe crossover Easter eggs, animal ecology and behavior references, symbol and metaphor Easter eggs, social satire and brand allusions, and emotional storylines and sequel foreshadowing.
[Zootopia Character Relationship Chart] The idealistic rabbit police officer Judy and the cynical fox conman Nick form a charmingly contrasting duo, rising from street hustlers to become Zootopia police officers!
Avatar 3 centers on the Sully family, showcasing the internal rift caused by the sacrifice of their eldest son, and their alliance with other tribes on Pandora against the external conflict of the Ashbringers, who adhere to the philosophy of fire and are allied with humans. It explores the grand themes of family, faith, and survival.
This article discusses the Easter eggs and homages in Zootopia 2 that you may have discovered. The main content includes: character and archetype Easter eggs, cinematic universe crossover Easter eggs, animal ecology and behavior references, symbol and metaphor Easter eggs, social satire and brand allusions, and emotional storylines and sequel foreshadowing.
[Zootopia Character Relationship Chart] The idealistic rabbit police officer Judy and the cynical fox conman Nick form a charmingly contrasting duo, rising from street hustlers to become Zootopia police officers!
6. Data storage and operation
introduction
definition
Data storage includes the design, implementation and support of stored data to maximize the value of data resources throughout the entire life cycle from data creation/acquisition to disposal.
sub-activity
Database operation support
Mainly focuses on activities related to the data life cycle, that is, from the initial establishment of the database environment, to data acquisition, backup, and then to data disposal
Also includes the need to ensure database performance is in good shape
Database technical support
Includes defining database technology requirements that meet the needs of the organization, defining the database's technical architecture, installing and managing database technology, and resolving database-related technical issues
Role
Database Administrator DBA
Plays an important role in both data storage and operation.
It is the most common and widely accepted role in the data profession.
DBAs also play a leading role when it comes to data security
business drivers
business continuity
If a system becomes unavailable, business operations may be compromised or even cease operations altogether.
goals and principles
Target
Possibility to manage data throughout its lifecycle
Ensure the integrity of data assets
Manage the performance of data transaction transactions
Data storage and manipulation represent the highly technical side of data management
in principle
Identify and act on automation opportunities
Shorten each development cycle process, reduce errors and rework, and minimize impact
In this way, DBAs can adapt to a more agile and iterative approach to application development
Build with reuse in mind
Develop and promote abstract and reusable data objects without tightly coupling applications to data schemas
Understand and appropriately adopt best practices
DBAs should promote database standards and best practices as requirements
Support standard requirements for databases
Set expectations for the DBA role in the project
Involving the DBA in the project definition phase helps ensure that the project methodology is used throughout the entire software development life cycle
Involve the DBA in the project analysis and design phases and clarify expectations for DBA tasks, standards, work results, and development work schedules
basic concept
Database terminology
database
is a collection of stored data
Some large databases are also called "instances" and "schema"
Example
Perform controlled access to a specific storage area through database software
An organization often uses different storage areas to execute multiple instances simultaneously
Each instance is independent from all other instances
model
Is a subset of database objects in a database or instance
Schemas are used to organize database objects into manageable collections
Usually a schema has a user and a specific access list for accessing the content of the schema
Common usage
Isolate objects containing sensitive data from the general user base
Isolate read-only views from underlying tables in relational databases
Can also represent a collection of similar database structures
node
A single computer as part of a distributed database that processes or stores data
Database abstraction
Common application interface API is usually used to call database functions
This way, an application can connect to different databases without the developer having to know which databases all functions may call.
Advantage
Very portable
shortcoming
Some database-specific functions are difficult to use across libraries
Data lifecycle management
DBAs are all responsible for maintaining and ensuring data accuracy and consistency
The DBA is the manager of all database changes
When the demander requests changes to the database, the DBA defines the changes that need to be made to the database, implements the changes, and controls the results of the changes.
The DBA should adopt a controllable, recordable, and auditable process to implement application database changes to the QA environment and production environment
The DBA should have a rollback plan to undo the changes if the changes are abnormal.
administrator
DBA is the most common and widely accepted role in the data profession
DBA assumes leading role in data storage and operations
DBA also plays a key role in data security, physical model, and database design activities
DBA division of labor
Production DBA
Mainly responsible for data operation management
Application DBA
Application DBAs are typically responsible for one or more databases in (development, test, QA, production) rather than being assigned responsibility for managing the database environment of an environment.
Process and Development DBA
The process DBA is responsible for reviewing and managing the managed objects of the database
Development DBA focuses primarily on data design activities
Database schema type
Databases can be divided into centralized databases and distributed databases
Centralized database management single database
Distributed database manages multiple databases on multiple systems
centralized database
Centralized databases store all data in one system in one place
All users connect to this system for data access
Centralization may be ideal for some data with limited access, but for data that requires large-scale, widespread use, centralized databases may be risky
Distributed database
federated database
The data provided by data federation does not require additional replication or persistence of the data source.
The federated database system maps multiple autonomous database systems into a single federated database.
The databases that make up the federation are sometimes dispersed in different geographical locations and are linked together through computer networks.
Because it is data federation, the federated database does not integrate real data, but manages the data federation as a large object through data interoperability.
Blockchain database
Blockchain database is a federated database used to securely manage financial transactions
There are two structural types of blockchain databases
Single records and blocks
Each transaction contains a record, and each block contains a set of timestamped transactions. The entire database is composed of a chain structure formed by multiple blocks. Each block also includes information from the previous block in the chain.
The transaction information stored in the block uses a hash algorithm when generated, and the newly generated block is located at the end of the entire chain
Once a new block is generated, the hash value of the old block (previous block) no longer changes. This means that the transaction information in the block will never change again
If the transaction information (or block) changes in any way during the transmission process (such as being tampered with), then the hash value obtained by running the hash calculation will not match the original hash value.
Visualization/Cloud Computing Platform
Providing computing, software, data access and storage services without requiring end users to know the physical location and related configuration of the service systems provided
Can be deployed locally or remotely
Implement database methods
Virtual machine image
On these virtual machines, users can deploy databases
You can also upload the machine image with the database installed to the cloud.
Database as a Service
Application users do not need to install and maintain the database themselves, they only need to pay to use the database
Manage databases hosted on the cloud
The cloud vendor manages the database on behalf of the application owner
Data processing type
ACID
Atomicity
Either all operations are completed or none are completed
Consistency
Transactions must fully comply with the rules defined by the system at all times, and unfinished transactions must be rolled back
Isolation
Each transaction is independent
Durability
Once a transaction is completed, it cannot be undone
In relational database storage, ACID related technology is the most important tool, usually using SQL interface
BASE
background
The need to record and store unstructured data, the need for read optimization and data load performance, and the subsequent need for greater flexibility in scale-out, design, processing, cost and disaster recovery are all moving in the exact opposite direction of ACID. one party. BASE was born to meet these needs.
Basically Available
Even if a node fails, the system can still guarantee a certain level of data availability. The data may be out of date, but the system will still respond.
Soft State
Data is in a state of continuous flow and is not guaranteed to be up to date when a response is given.
Eventual Consistency
The final state of the data is consistent on all nodes and all databases, but it is not consistent in every transaction at all times.
BASE-type systems are usually used in big data environments, such as large Internet companies and social media companies.
Record and store unstructured data
CAP (Brewer's theorem)
Theories proposed in the evolution of centralized systems toward distributed systems
The CAP theorem means that it is impossible for a distributed system to meet all the requirements of ACID at the same time.
The larger the system scale, the fewer requirements points it meets.
Distributed systems must trade off various properties (requirements)
theorem
consistency
Systems must always perform as designed and intended
Availability
The system remains available and responds to requests when they occur
Partition fault tolerance
In the event of occasional data loss or partial system failure, the system can still continue to operate and provide services.
The CAP theorem states that in any system that shares data, at most two of these three requirements can be satisfied.
data storage media
Disk and storage area network SAN
Memory
Column compression scheme
Memory
Database environment
Production Environment
The production environment refers to the technical environment where all production business processes occur
The production environment is very important, if it stops running, all business processes will stop, ultimately resulting in loss of business, while negatively impacting those customers who cannot access the service
non-production environment
System changes need to be developed and tested in a non-production environment before they are actually deployed to the production environment.
In non-production environments, problems caused by changes can be checked and dealt with in advance without affecting normal business processes.
Classification
development environment
test environment
Performance Testing
High-complexity or high-volume testing can be considered at any time without having to wait until after hours or adversely impact the peak hours of production systems
Integration Testing
Test multiple modules developed or updated independently as a whole system
UAT User Acceptance Testing
System functional testing from a user perspective
QA Quality Assurance Testing
and functional testing requirements
Support and special purpose environments
Data sandbox and experimental environment
A data sandbox is another environment that allows read-only access and management of production data
Data used to develop experiments or test relevant hypotheses
Or the user merges data developed by himself or supplementary data obtained from outside with production data
The value of a data sandbox is like a proof of concept
A sandbox environment can be a subset of the production system that is isolated from production processing, or it can be a completely separate environment
Sandbox users often create CRUD permissions in their own space
Database organization model
Overview
The data storage system provides a way to encapsulate the instructions required to put data on disk and manage and process these data, so developers can simply use the instructions to operate the operating system
Databases are usually organized in three forms: hierarchical, relational, and non-relational. These classifications are not completely mutually exclusive.
Some database systems can simultaneously read and write data organized in relational and non-relational structures.
Hierarchical database can be mapped into relational table structure
hierarchical database
Reused in early large-scale database management systems, its structural requirements are the most stringent.
Data is organized into a tree structure with enforced parent-child relationships
Each parent can have many children, but each child has only one parent
Relational Database
Relational database management system is called RDBMS
Relational databases are row-oriented
include
multidimensional database
This type of structure is most commonly used in data warehousing and business intelligence
temporal database
It is a relational database with built-in support for processing time-related data.
Time-oriented properties typically include validity time and transaction time
non-relational database
Store data as simple strings or complete files
Non-relational databases can be row-oriented, but they don’t have to be
NoSQL databases are increasingly used in big data and real-time WEB applications
include
Column database
Ability to compress data, often used in business intelligence BI applications
spatial database
Used to store and query object data that represents definitions in spatial geometry
Supports geometric figures, points, and lines
Operations that can be performed with the spatial database
spatial assessment
space function
spatial prediction
Geometry
Observation function
Objects, multimedia databases
flat file database
key value pair
triple storage
The data entity composed of subject, predicate and object is called triple storage
dedicated database
Computer Aided Design and Manufacturing CAD/CAM
Geographic Information System GIS
Shopping cart function
Common database procedures
Data archiving
The process of migrating data from an immediately accessible storage medium to a storage medium with lower query performance
Archived data can be restored to the original system for short-term use
Data that does not need to actively support application processing should be migrated to less expensive disks, tapes, CD/DVDs for archiving
It is wise to conduct regular recovery testing of archives to ensure that in the event of an emergency, unrecoverable surprises are avoided
Capacity and growth forecasts
Change Data Capture CDC
The process of detecting changes in data and ensuring that information related to the changes is appropriately recorded
Usually refers to log-based replication, which is a non-invasive method of replicating data changes to the target without affecting the source.
CDC only sends changed content (incremental information), so the receiving system can update appropriately.
Detect and collect changes methods
Data version control
Evaluate flags to modify columns of rows
By reading the log
Changes are recorded in the log and can be replicated to secondary systems
Data clearing
Refers to the process of completely deleting data from storage media and rendering it unrecoverable
The primary goal of data management is that the cost of maintaining data should not exceed its value to the organization
Cleaning data reduces costs and risks
Considered outdated and unnecessary even from a regulatory perspective
If it lasts longer than necessary, it will become a burden
Clearing this data also reduces the risk of its misuse
Data replication
Active replication
There is no master replica and the same data from other replicas can be actively created and stored on each replica
passive copy
First create and store data on the primary replica, then transfer the changed state to other replicas
Copy method
mirror
Updates to the primary database will be immediately synchronized to the secondary database
The transmission cost is higher than the log method
Typically valid for a secondary server
log shipping
The secondary database regularly receives and applies copies of the transaction log from the primary database.
Can be used to update data to more servers
Resilience and recovery
Database resilience is a measure of a system's tolerance for error conditions
A system is resilient if it can tolerate a high level of error handling and still work as expected
If you collapse when encountering unexpected conditions, you have no resilience.
Recovery type
Restore now
By design, automatic switching to the backup system is predicted
critical recovery
Recover as quickly as possible to minimize business delays or disruptions
non-critical recovery
It means that this type of business can be restored in a delayed manner and guides the recovery of more critical systems to be completed.
Data retention
Refers to how long the data remains available
Data retention planning should be part of the physical database design
Data retention needs also impact capacity planning
Failure to retain certain data for an appropriate period of time may result in legal consequences
Data can become a burden if kept longer than specified
Data sharding
It is a process of isolating a part of the database
Activity
Manage database technology
Understand the technical characteristics of databases
Evaluate database technology
Manage and monitor database technology
Manage database operations
Understand the needs
Define storage requirements
Identify usage patterns
Define access requirements
Plan for business continuity
Back up data
Data recovery
Create database instance
Install and update DBMS software
Maintain installations in multiple environments
Install and manage relevant data technologies
Manage database performance
Manage test data sets
Manage data migration
The granularity of the mapping determines how quickly the metadata is updated, how much additional disk capacity is required during the migration process, and how quickly the previous location is marked as free.
Smaller granularity means faster updates, less space required, and faster release of old storage
tool
Data modeling tools
Database monitoring tools
Database monitoring tools automatically monitor key indicators (such as capacity, availability, cache performance, etc.) and alert DBAs and network storage administrators of current database issues.
Database management tools
Development support tools
method
Test in a low-level environment
After testing in the lowest-level environment, continue to verify in the next-level environment, and finally install and deploy to the production environment.
Physical naming standards
Consistency in naming helps speed understanding
Data architects, database developers, and DBAs can use naming standards to define metadata or create rules for exchanging files between different organizations
All change operations are scripted
Implementation Guide
Readiness Assessment/Risk Assessment
data lost
Technical preparations
Organizational and cultural changes
Data storage and operational governance
Metrics
Data storage metrics
Performance metrics
operational metrics
Service Metrics
Information asset tracking
Ensure the database complies with all licensing agreements and regulatory requirements
Data audit and data validity
The purpose of the audit is to determine whether data is stored in compliance with contractual and methodological requirements
Data validation is the process of evaluating stored data against established acceptance criteria to determine its quality and usability