MindMap Gallery HCIA-GaussDB
HCIA-GaussDB is one of Huawei certifications, and its full name is Huawei Certified GaussDB Database Engineer. This certification is mainly intended for users, partner engineers, internal engineers, college students, and ICT practitioners who use Huawei's GaussDB database product. It is one of the engineer-level certifications in Huawei certification.
Edited at 2024-01-19 10:24:00This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
HCIA-GaussDB
Database introduction
Database Technology Overview
Database Technology
Data Data
Record Record
Database Database DB
Database management system DBMS
Database system DBS
view
History of database technology development
Database technology emerges
The emergence and development of database technology
view
Comparison of three stages of data management
view
Database system advantages
Overall data structuring
Data is highly shareable, low in redundancy and easy to expand
High data independence
Physical independence: The application and the physical storage of data in the database are independent of each other
Logical independence: The logical structures of the application and the database are independent of each other
Unified management and control
Data security protection
Data integrity check
Concurrency control
Database recovery
Database system development characteristics
view
Hierarchical, network, relational models
view
hierarchical model
There is and is only one node that has no parents. This node is called the root node (root).
Nodes other than the root node have one and only one parent node
mesh model
Allow more than one node to have no parents
A node can have more than one parent
relational model
Built on rigorous data concepts
Relationships must be normalized
The component of the relation must be an indivisible data item
Comparison of hierarchical, network, and relational models
view
Structured Query Language-Structured Query Language (SQL language)
High-level non-procedural programming language that allows users to work on high-level data structures
Does not require users to specify data storage methods
Users are not required to know the specific data storage method
Various relational database systems with completely different underlying structures can use the same SQL language as the interface for data operation and management.
Other data models
Object-oriented data model (ObjectOrientedDataModel, OO model)
XML data model
Extensible markup language (XML for short)
RDF data model
Resource Description Framework (RDF)
New challenges in data management technology
view
5V characteristics
Volume quantity
Variety
Veracity speed
Velocity real
Value value
NoSQL technology characteristics and types
NoSQL(NotOnlySQL)
A type of data management system that is non-relational, distributed and does not guarantee to meet ACID characteristics.
Technical features
Partitioning the data, using a large number of nodes for parallel processing to achieve high performance, and being able to scale out at the same time
Reduce ACID consistency constraints, allow temporary inconsistencies, and accept eventual consistency. Follow CAP theory and BASE principles
Each data partition provides backup (usually three copies) to cope with node failures and improve system availability.
Introduction to major NoSQL databases
view
NoSQL is not meant to replace RDBMS
The advantages are obvious, but the disadvantages are also obvious
Build a complete database ecosystem with RDBMS
A brief discussion of NewSQL
NewSQL
Refers to a relational database system that pursues the scalability of NoSQL while supporting relational models (including ACID features), mainly for OLTP scenarios.
Ability to support SQL as the primary language used
NewSQL classification
Rebuilding the product using a new architecture
Shared-Nothing, multi-node concurrency control, distributed processing, fault tolerance using replication, flow control and other technical architectures
Google Spanner, H-Store, VoltDB, etc.
Using Transparent Sharding middleware technology
The process of data sharding is transparent to users, and users’ applications do not need to make changes.
OracleMySQL Proxy, MariaDBMaxSacle, etc.
DAAS (Database-as-a-Service, database as a service)
Database products provided by cloud service providers. Cloud service providers provide database products with NewSQL features.
Amazon Aurora, Alibaba Cloud’s Oceanbase, Tencent Cloud’s CynosDB
Cloud database
Cloud database refers to a database that is optimized or deployed into a virtual computing environment
Traditional database VS cloud database(1)
view
Traditional database VS cloud database (2)
view
relational database architecture
Database architecture development
view
Stand-alone architecture
Stand-alone architecture
In order to avoid competition for resources between application services and database services, the stand-alone architecture has also evolved from the early single-host mode to the database independent host mode, which separates applications and data services. Application services can increase the number of servers, perform load balancing, and increase system concurrency capabilities.
advantage
Centralized deployment and convenient operation and maintenance
shortcoming
Scalability
The scalability of the database stand-alone architecture is only vertical expansion (Scale-up). Improve performance by increasing hardware configuration, but the hardware configurable resources of a single host will reach an upper limit
There is a single point of failure
When expanding the capacity, it is often necessary to stop the expansion and stop the service.
Hardware failure results in unavailability of the entire service or even data loss
A single machine encounters a performance bottleneck
grouping architecture
Primary and secondary
Main and backup machine architecture
The database is deployed on two servers, and the server responsible for data reading and writing services is called the "host"
Another server uses the data synchronization mechanism to copy the host's data, which is called a "standby server"
At the same time, only one server provides data services to the outside world.
advantage
Applications do not require increased development in response to database failures
Improved data fault tolerance compared to single-machine architecture
shortcoming
Waste of resources. The standby machine and the main machine are configured equally, but resources are basically limited in the long term and cannot be utilized.
Performance pressure is still concentrated on a single machine, and performance bottlenecks cannot be solved.
When a failure occurs, the switchover between the active and standby machines requires certain manual intervention or monitoring.
Master-slave
Master-slave architecture
The deployment mode is similar to the active and backup mode. The backup machine is promoted to a slave machine (Slave) and provides certain data services to the outside world.
Distribute pressure by separating reading and writing
Writing, modifying, and deleting operations are completed on the writing library (host)
Assign the query request to the reading library (slave machine)
advantage
Improve resource utilization, suitable for application scenarios with more reading and less writing
In the usage scenario of large concurrent reading, load balancing can be used to balance among multiple slave machines.
The scalability of the slave machine is relatively flexible, and the expansion operation will not affect the business process.
shortcoming
Latency problem, there will be a delay when data is synchronized to the slave database, so the application must be able to tolerate short-term inconsistency; it is not suitable for scenarios with very high consistency requirements.
The performance pressure of write operations is still concentrated on the host
If the host fails, master-slave switching needs to be implemented. Manual intervention requires response time, and automatic switching is more complex.
Multiple owners
Multi-master architecture
The database servers are master-slave to each other and provide complete data services to the outside world at the same time.
advantage
Higher resource utilization while reducing the risk of single points of failure
shortcoming
Both hosts accept write data, and two-way data synchronization is required. Bidirectional replication also brings latency issues, and in extreme cases data loss may occur.
The increase in the number of databases will cause data synchronization problems to become extremely complex. Dual-machine mode is often seen in practical applications.
Shared storage multi-active architecture
Multi-active architecture of shared storage (Shared-Disk)
A special multi-master architecture
Database servers share data storage, and multiple servers balance the load
advantage
Multiple computing servers provide high-availability services, providing a high level of availability. Scalability avoids the single point of failure problem of server clusters
Convenient horizontal expansion can increase the parallel processing capabilities of the overall system
shortcoming
Technically difficult to implement
Sharding architecture
The main form of sharding architecture is horizontal data sharding architecture
A sharding scheme that distributes data across multiple nodes. Each shard includes a part of the database, called a shard.
Multiple nodes have the same database structure, but there is no intersection between data from different shards. The union of all partition data constitutes the overall data.
Common sharding algorithms include: data sharding based on list values, range values and Hash values
advantage
Data is scattered on various nodes in the cluster, and all nodes can work independently
Shared-Nothing architecture
shared nothing architecture
Each node (processing unit) in the cluster has its own independent CPU/memory/storage, and there are no shared resources.
Each node (processing unit) processes its own local data, and the processing results can be summarized to the upper layer or transferred between nodes through communication protocols.
Nodes are independent of each other and have strong scalability. The entire cluster has powerful parallel processing capabilities
MPP architecture (Massively Parallel Processing)
MPP: Massively Parallel Processing (MassivelyParallelProcessing)
MPP distributes tasks to multiple servers and nodes in parallel. After the calculation is completed on each node, the results of each part are summarized together to obtain the final result.
feature
Task parallel execution, distributed computing
Common MPP products
No shared master: Vertica, Teradata
Shared Master: Greenplum, Netezza
Comparison of database architecture features
Mainstream application scenarios of relational databases
OnLineTransaction Processing
OLTP is the main application of traditional relational databases
For basic, daily transaction processing, such as bank deposit and withdrawal transactions, transfer transactions, etc.
Features
High throughput: a large number of short online transactions (inserts, updates, deletes), very fast query processing
High concurrency, (quasi) real-time response
Typical OLTP scenario
retail system
financial trading system
Train ticket sales system
flash sale activity
OnLineAnalytical Processing
OLAP
The concept of online analytical processing was first proposed by E.F. Codd in 1993 for OLTP systems.
It refers to the query and analysis operations of data, usually querying and analyzing a large amount of historical data. The historical period involved is relatively long, the amount of data is large, and the aggregation and aggregation operations at different levels make transaction processing operations more complex.
Features
Mainly focused on complex queries and answering some "strategic" questions
Data processing focuses on "analytical" data processing and operations such as data aggregation, summary, group calculation, and window calculation.
Use and analyze data from multiple dimensions
Typical OLAP scenario
Reporting system, CRM system
Financial risk prediction and early warning system, anti-money laundering system
data mart, data warehouse
Comparative analysis of OLTP and OLAP
Database performance metrics
TPC (TransactionProcessingPerformanceCouncil, Transaction Processing Performance Council)
Responsibility is to develop specifications, performance and price metrics for business application benchmarks (Benchmark), and manage the release of test results
What is formulated is a standard specification rather than a code. Any manufacturer can optimally construct its own system for evaluation based on the specification.
Many benchmark testing standards have been launched, including two specifications for OLTP and OLAP.
TPC-C specification
For OLTP systems, it mainly includes two indicators
Traffic indicator: tpmC (tpm–transactions per minute, that is, the number of transactions processed by the test system per minute)
Cost-effectiveness indicator: Price (test system price)/tmpC
TPC-H specification
For OLAP type systems
For OLAP type systems
Traffic indicator: qphH–Query per hour, that is, the number of complex queries processed per hour
It is necessary to consider the size of the test data set, which is divided into different test data sets. 22 query statements are specified, which can be fine-tuned according to the product.
Test scenarios: data loading, Power capability test and Througput test
Database basics
Introduction to database management
Database management and its scope of work
Database Admin
Database management
Database management work is the work of managing and maintaining the database management system
The core goal is to ensure that the database management system
stability
safety
data consistency
High performance of the system
Database Administrator (DatabaseAdministrator)
The collective name for relevant personnel engaged in the management and maintenance of database management systems
Database management work scope
Database object management
physical design work
Physical implementation work
Database security management
Prevent unauthorized access and avoid disclosure of protected information
Prevent security breaches and inappropriate data modification
Ensure data is only available to authorized users
Backup and recovery management
Develop a reasonable backup strategy to implement regular data backup functions
Ensure that the database system can achieve the fastest recovery and minimum loss when a disaster occurs
Database performance management
Monitor and optimize factors affecting database performance
Optimize the resources available to the database to increase system throughput, reduce contention, and maximize workload processing
Database environment management
Database operation and maintenance management, including installation, configuration, upgrade, migration, etc.
Ensure the normal operation of IT infrastructure including database systems
Object management
What is a database object
A general term for various concepts and structures used to store and point to data in a database
Object management is the management process of creating, modifying or deleting various database objects using object definition languages or tools.
Common basic database objects
Develop database object naming conventions
Backup and recovery management
database backup
Backing up the database is to save the data in the database and the relevant information to ensure the normal operation of the database system, so that it can be used to restore the database after a system failure.
Backup objects, including but not limited to
data itself
Database objects related to data
Users and permissions
Database environment, such as configuration files, scheduled tasks, etc.
Database recovery
Activities that restore a database system from a failed or paralyzed state to normal operation and restore data to an acceptable state
disaster recovery
Enterprise-level disaster recovery
For enterprises and units, the database system and other application systems constitute a larger information system platform, so database backup and recovery is not an isolated function point. The disaster recovery performance of the entire information system platform must be considered together with other application systems.
Disaster backup
The process of backing up data, data processing systems, network systems, infrastructure, professional technical capabilities, and operational management capabilities for disaster recovery
Recovery Time Objective (RTO)
After a disaster occurs, the time required for an information system or business function to be restored from a standstill to the time it must be restored
Recovery Point Objective (RPO)
Requirements for the point in time to which systems and data must be restored after a disaster occurs
Disaster recovery level
The relationship between RTO/RPO and disaster recovery capability level in a certain industry
Backup method
Based on the scope of the backed up data collection
Full backup
Full backup
Also known as full backup
Make a complete backup of all data and corresponding structures at a specified point in time
Features
The most complete data
Highest security
Backup and recovery times increase significantly with data size
Very important, it is the basis of differential backup and incremental backup
The backup period will have a certain impact on system performance.
differential backup
Differential backup refers to the backup of data that has changed since the last full backup.
incremental backup
Incremental backup refers to the backup of data that has changed since the last backup.
Comparison chart
Depending on whether to deactivate the database
Hot Standby
Back up the database while it is running normally
During the backup period, database reading and writing can proceed normally.
Warm preparation
Database availability is weaker than hot standby. During backup, the database can only perform read operations and cannot perform write operations.
cold standby
During the backup period, the application's read and write operations are not available
The backed up data has the highest reliability
According to the backup content
physical backup
Directly back up the data files corresponding to the database or even the entire disk
logical backup
Export data from the database and archive the exported data
Comparison chart
Security management
Database system security framework
Broadly speaking, the database security framework can be divided into three levels
network level security
From a technical perspective, network system level security method technologies mainly include encryption technology, digital signature technology, firewall technology and intrusion detection technology, etc.
Operating system level security
The core is to ensure the security of the server, which is mainly reflected in the server's user account, password, access rights, etc.
Data security is mainly reflected in encryption technology, data storage security, data transmission security, etc., such as Kerberos, IPsec, SSL and VPN technologies
Data management system layer security
Database encryption
Data access control
security audit
data backup
branch topic
security control model
safely control
Provide security protection against intentional and unintentional damages at different levels of the database application system, such as:
Encrypted access to data -> Intention of illegal activities
User authentication, restricted operation permissions -> intentional illegal operations
Improve system reliability and data backup -> unintentional damaging behavior
security control model
branch topic
Authentication
Database user authentication is the outermost security protection measure provided by DBMS
Block access by unauthorized users
For database applications, the username and password verification mode is currently commonly used, so it is necessary to strengthen the password strength.
Use a longer string, such as 8-20 characters
Passwords that mix numbers, letters and symbols
Change password regularly
Password cannot be reused
In the developed code or script, the clear text of the password of the database user is prohibited.
Access control
Access control is the most effective method in database security but also the most prone to problems.
The basic principle
Different permissions are given to different users based on the classification requirements of sensitive data.
principle of least privilege
Check key permissions
Check permissions on key database objects
Role-based permission management
For large database systems or systems with a large number of users, role-based access control (RBAC) is mainly used for permission management.
Enable auditing
Auditing can help database administrators discover vulnerabilities in existing architecture and usage
Database audit levels
Access and authentication audit, database user login (logon), logout (logoff) related information, such as login and logout time, connection method and parameter information, login method, etc.
User and Admin Auditing: Analysis and reporting on activities performed by users and administrators
Security activity monitoring: record any unauthorized or suspicious activities in the database and generate audit reports
Vulnerability and Threat Audit: Discover possible vulnerabilities in the database and the "users" who want to exploit these vulnerabilities
Database encryption
Different levels of database encryption
DBMS kernel layer
Data is encrypted/decrypted before physical access
It is transparent and invisible to database users.
Encrypted storage is used, and the encryption operation is run on the server side, which will increase the load on the server to a certain extent.
DBMS outer encryption
Develop special encryption and decryption tools, or define encryption and decryption methods
You can control the granularity of encryption objects and perform encryption and decryption at the table or field level.
Users only need to pay attention to the scope of sensitive information
Performance management
resource
Supply resources
This type of resources is also called basic resources, which are resources corresponding to computer hardware.
Resources managed by the operating system
Processing power: CPU>Memory>>Disk≈Network
Concurrency control resources
Such resources include but are not limited to: locks, queues, caches, mutual exclusion signals, etc.
Resources managed by database systems
Basic Principles of Performance Management
Make full use of resources without wasting them
The meaning of performance management
Efficient use of resources
Databases actually always operate in a restricted environment
Effective management of resources ensures that the database system can meet user performance requirements for the system during peak periods
Detect system problems
Real-time system performance monitoring (real-time monitoring of system performance through logs or tools provided by the database)
System historical performance data tracking (analysis of historical performance data)
capacity planning
The data collected by performance management is the basis for system capacity planning and other forward-looking planning
Speak with facts rather than feelings
Performance management goals
Basic indicators of database system
Throughput
Response time
OLTP
Provide the highest possible throughput within acceptable response times
Reduce unit resource consumption, quickly pass through concurrent shared areas, and reduce bottleneck constraints
OLAP
Minimize response time within limited resources
A transaction should fully utilize resources to speed up processing time
Some scenarios for performance optimization work
Online optimization or performance optimization that does not meet performance expectations
System optimization for gradually slower response times
System optimization (emergency treatment) when the system suddenly slows down during operation
Slows down suddenly, then returns to normal after continuing for a while
System optimization based on reducing resource consumption
Preventive daily inspection work
Data that needs to be collected for performance management
The scope of data that needs to be collected for performance management includes but is not limited to
CPU usage data
space usage
Users and roles using the database system
Heartbeat query response time
SQL submitted to the database is the basic unit of performance data
Job-related performance data submitted by the database tool (such as loading, unloading, backup, recovery, etc.)
time range of concern
Daily scope: peak hours of the week; end of month; seasonal variation data
Within a day: time period when users intensively use the system; time period when system pressure is relatively high, etc.
Create performance reports
The database system has many built-in monitoring reports
Extract performance-related data to create regular performance reports (daily, weekly, monthly reports)
Establish performance trend analysis reports for common indicators, which can provide an intuitive display of current system performance.
Reports on specific trend types, including but not limited to
Reports based on abnormal events
SQL or jobs that consume a lot of resources
Resource consumption reports for specific users and user groups
Resource consumption reports for specific applications
Operation and maintenance management
Database installation
Database unloading
Database migration
Migration plans need to be designed based on the needs of different migration scenarios.
factors to consider
Time window available for migration
Tools you can use for migration
Whether the data source system stops writing operations during the migration process
What is the network condition between the data source system and the target system of the migration process?
Estimate backup/restore time based on amount of data migrated
Post-migration, data consistency audit between source and target database systems
Database expansion
The capacity of any database system is determined by estimating the amount of data in the future based on a certain point in time. Capacity is not only the amount of data storage, but also needs to consider the following aspects:
Insufficient computing power (average daily CPU busyness of the entire system is >90%)
Insufficient response/concurrency capabilities (QPS, TPS dropped significantly, unable to meet SLA)
Insufficient data capacity (available data space is less than 15%)
Selection of expansion plans
vertical expansion
Vertical expansion is to add database server hardware, such as increasing memory, increasing storage, increasing network bandwidth, and improving the performance configuration of stand-alone hardware. This method is relatively simple, but it will encounter a single-machine hardware performance bottleneck.
Horizontal expansion
Increase the number of servers horizontally and take advantage of the number of servers in the cluster to increase the performance of the overall system
Downtime expansion
Simple, but the time window is limited, and problems will cause expansion failure. And if it takes too long, it will not be easily accepted by customers.
Smooth expansion
No impact on database services
The technical solution is relatively complex, especially as the number of database servers increases, the complexity of expansion increases sharply.
Routine maintenance work
Database troubleshooting
Configure database monitoring indicators and alarm thresholds
Set the alarm notification process according to the level of fault events
After accepting the alarm information, locate the fault based on the logs.
For problems encountered, the original information should be recorded in detail
Strictly abide by operating procedures and industry safety regulations
For major operations, the feasibility of the operation must be confirmed before the operation, and corresponding backup, emergency and safety measures must be taken before the operation is performed by authorized operators.
Database health inspection
View health check tasks
Manage health check reports
Modify health check configuration
Important database concepts
Databases and database instances
Database
A collection of physical operating system files or disk data blocks
Such as data files, index files, structure files
Not all database systems are file-based, and there are also forms that write data directly to data storage.
DatabaseInstance
An instance refers to a series of processes in the operating system and the memory blocks allocated for these processes.
A database instance is a channel for accessing a database
Generally speaking, one database instance corresponds to one database
Multi-instance operation can make full use of hardware resources and maximize server performance.
Distributed cluster
A cluster is a group of independent servers that form a computer system through a high-speed network.
In a distributed cluster, each server may have a complete copy or a partial copy of the database. All servers are connected to each other through the network to form a complete, global, logically centralized, and physically distributed large-scale database.
Database connections and sessions
Database connection(Connection)
The physical level communication connection refers to a network connection between a client and a dedicated server (Dedicated Server) or scheduler (Shared Server) established through the network.
Specify connection parameters when establishing a connection, such as server host name or IP, port number, connection user name and password, etc.
Database session (Session)
Logical concept of communication between client and database
A context (Context) between the communicating parties from the beginning of communication to the end of communication. This context is a piece of memory located on the server side: it records the client machine of this connection, the corresponding application process number, the corresponding user login and other information.
Database connection pool
Establishing a database connection comes at a cost
Frequently establishing and closing database connections will make the allocation and release of connection resources a bottleneck in the database, thereby reducing the performance of the database system.
Connection pool: reuse of database connections
Responsible for allocating, managing, and releasing database connections. It allows applications to reuse an existing database connection instead of establishing a new one.
Database connections can be reused efficiently and securely
Schema
Schema is a structure described in a database formal language and is a collection of objects.
Allow multiple users to use a database without interfering with other users
Organize database objects into logical groups to make them easier to manage
Form a namespace to avoid object name conflicts
Schema includes tables and other database objects, data types, functions, operators, etc.
Tablespace
A table space is composed of one or more data files
Define the storage location of database object files through table spaces
All objects in the database are logically stored in table spaces
Physically stored in the data file to which the table space belongs
Table space function
Arrange the physical storage location of data according to the database object usage pattern to improve performance
Frequently used indexes are placed on disks with stable performance and fast computing speed.
Archive data, tables with low usage frequency and low access performance requirements are stored on slow disks.
Specify the physical disk space occupied by the data through the table space
Limit the upper limit of physical space usage through table space to avoid running out of disk space
Table
Temporary tables
How tables are stored
Choice of storage method
List suitable scenarios
Statistical analysis queries (scenarios with many groups and joins)
Suitable for application queries such as OLAP, data mining and other large-scale queries
Scenarios suitable for bank deposits
Point query (return few records, simple query based on index)
Suitable for OLTP, this kind of lightweight transactions, large number of write operations, and scenarios with a lot of data additions, deletions and modifications
Partition
A partitioned table divides the data of a large table into many small data subsets, called partitions.
range partition table
list partition table
Hash partition table
interval partition table
Partition table benefits
Improve query performance
Enhance usability
Easy maintenance
Balanced I/O
The principle of partition pruning
Partition pruning
When querying partition objects, you can only search the partitions you care about to improve retrieval efficiency.
Applicable scenarios for partitioning
Data distribution
Data strategy selection
Distribution column selection principles
type of data
Field design suggestions
Try to use efficient data types
Try to use data types with higher execution efficiency
Try to use short field data types
Use consistent data types
When multiple tables have logical relationships, fields with the same meaning should use the same data type.
For string data, it is recommended to use variable length string data type and specify the maximum length
View
A view is different from a basic table. It does not actually exist physically. It is a virtual table.
The role of views
View function
Simplify operations and define frequently used data as views
Security, users can only query and modify visible data
Logical independence shields the impact of the structure of the real table
restrictive
Performance issues: The query may be simple, but the encapsulated view statement is complex
Modification restrictions: For complex views, users cannot modify base table data through the view
Index
An index provides a pointer to the data value stored in a specified column of the table, like a book's table of contents, which can speed up querying of the table, but also increases the processing time of insert, update and delete operations.
When creating indexes, the following recommendations serve as a reference
Creating indexes on columns that are frequently searched can speed up searches
Create an index on a column that serves as the primary key to enforce the uniqueness of the column and organize the arrangement of the data in the table
Create indexes on columns that frequently need to be searched based on ranges because the index is already sorted and its specified range is contiguous
Create indexes on columns that often need to be sorted, because the index is already sorted, so that queries can take advantage of the sorting of the index to speed up sorting query times.
Create indexes on columns that frequently use WHERE clauses to speed up the judgment of conditions.
Create indexes for fields that often appear after the keywords ORDER BY, GROUP BY, and DISTINCT
valid index
Create index ≠ index must be used
After the index is successfully created, the system will automatically determine when to reference the index. Indexes are used when the system believes that using an index is faster than sequential scanning
After the index is successfully created, it must be synchronized with the table to ensure that new data can be found accurately, which increases the load of data operations.
Useless indexes need to be deleted regularly
Judgment method
Check the execution plan by executing the explain statement to determine whether to use an index.
Index mode
constraint
Data integrity refers to the correctness and consistency of data. Integrity constraints can be defined when defining a table.
Integrity constraints are rules that do not occupy database space.
Integrity constraints are stored in the data dictionary along with the table structure definition
Common types of constraints
Uniqueness and primary key constraints (UNIQUE/PRIMARY KEY)
Foreign key constraints (FOREIGN KEY)
Check constraints (CHECK)
Not NULL constraint (NOT NULL)
Default constraints (DEFAULT)
branch topic
Relationships between database objects
Transaction
affairs
A transaction is a user-defined series of data operations that are performed as a complete unit of work
Atomicity: A transaction is a logical unit of work in a database. All operations in a transaction must be done or none of them must be done.
Consistency: The execution result of a transaction must be to move the database from one consistency state to another.
Isolation: The execution of a transaction in the database cannot be interfered with by other transactions. That is, the internal operations and data used by a transaction are isolated from other transactions, and transactions executed concurrently cannot interfere with each other.
Durability: Once a transaction is committed, the changes to the data in the database are permanent. Post-commit operations or failures will not have any impact on the transaction results.
There are two markers for the end of a transaction
End normally, COMMIT (submit transaction)
Abnormal end, ROLLBACK (rollback transaction)
transaction processing model
Submit level
transaction isolation level
Transaction isolation level and problem correspondence table
branch topic