Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

data structure

It is only a framework structure for review, consolidation and review. Detailed information requires reading books. Mainly include: introduction, linear tables, stacks, queues and array strings, numbers and binary trees, graphs, search and sorting, etc.

Edited at 2022-10-15 16:00:41

PlotWizard

Recent works View more works>>

data structure

PlotWizard

Recent works View more works>>

Recommended to you
Outline

data structure

introduction

Basic concepts of data structure

Data: a carrier of information, a collection of symbols recognized and processed by computer programs

Data element: the basic unit of data

Data type: a collection of values and a set of operations defined on this collection Atomic type: its value cannot be subdivided Structural type: its value can be subdivided into several components Abstract data types: abstract data organization and operations related to it

Data structure: including logical structure, storage structure, and data operations

Logical structure of data: set, linear structure, tree structure, graph structure or network structure

Data storage structure (physical structure): sequential structure, chain structure, index structure, hash structure

Data operations: The definition of operations is for logical structures, and the implementation of operations is for storage structures. means that the definition symbolizes the interface

Characteristics of algorithms: finiteness, certainty, feasibility, input, output Goals of a good algorithm: correctness, readability, robustness, efficiency, and low storage requirements

the complexity

time complexity Common time complexity: 1<log n < n < nlog n < n^2 < n^3 < 2^n < n! < n^n

space responsibility

linear table

Definition of linear table: a finite sequence of n elements with the same data type

Sequential representation of linear tables

Sequential storage of linear tables is also called sequential tables

Time complexity of basic operations: insertion-O(n), deletion-O(n), search (search by value)-O(n)

Linked representation of linear list

Linked storage of linear lists is also called singly linked lists

Insertion into singly linked list: head insertion method, tail insertion method

The time complexity of basic operations: search-O(n), delete (mainly spent on search, search O(n), delete O(1))-O(n), insertion-head insertion O(1), tail insertion Insert (save the tail pointer O(1), do not save the tail pointer O(n))

Double linked list

There are both pointers to successor nodes and pointers to predecessor nodes.

Time complexity of basic operations: search-O(n), delete (mainly spent on search, search O(n), delete O(1)), insert-O(1), insert at a specific position O(n)

circular linked list

The last node's successor is only to the head node of a singly linked list

Circular doubly linked list

static linked list

Use arrays to describe the linked storage structure of linear lists

Comparison of sequence list and linked list

Access method: Sequential list can be accessed sequentially or randomly; linked list can only be accessed sequentially

Logical structure and physical structure: The elements of a sequence list are logically adjacent and physically adjacent; the elements of a linked list are logically adjacent, but may (and sometimes are) not adjacent physically.

Search, insertion and deletion: Both searches are O(n). On average, half of the elements are moved when inserting into a sequential list. Inserting into a linked list only requires deleting a specific node. The storage density of a linked list is low.

Space allocation: Under static storage allocation, the sequence table cannot be expanded once the storage space is full, but under dynamic storage allocation, a large number of elements need to be moved; the linked list can apply for allocation if it has memory, and the operation is flexible and efficient.

Stacks, Queues and Arrays

stack

Only run a linear table for insertion and deletion at one end

Mathematical properties: n different elements are pushed onto the stack, and the different pop-up sequences are (Cn 2n)/(n 1)

Shared stack: more efficient use of storage space

stack chain structure

Top node of stack->Node 1->Node 2->....->Lower node of stack

The sequential storage structure of the stack

Array; note that the algorithm is different when the initial top=-1 and 0 at the top of the stack

queue

Only one end is allowed to be inserted and the other end is deleted.

circular queue

Initially front=rear=0

Advance the front pointer by 1: front = (front 1)%MAXSIZE

Advance the tail pointer by 1: rear=(rear 1) % MAXSIZE

Queue length: (rear MAXSIZE-front) % MAXSIZE

Team empty condition: front==rear

Team full conditions: (rear 1)%MAXSIZE==front

The chained storage structure of the queue: In order to unify the insertion and deletion operations, the chained queue is usually designed as a singly linked list with the head node.

deque

Allows two queues that can be enqueued and dequeued, equivalent to Dequeue in Java

Output-restricted double-ended queue: a double-ended queue that allows insertion and deletion at one end, but only output at the other end.

Input-restricted deque: A deque that allows insertion and deletion at one end, but only input at the other end.

For the question of judging the sequence, you only need to substitute it for verification.

Applications of stacks and queues

The application of stack in bracket matching: The key is to put the left bracket into the queue, and when the right bracket and the bracket on the top of the stack are encountered, check whether they match. If they match, the element on the top of the stack will be popped off the stack.

The application of the stack in expression evaluation: For postfix expressions, put the operands of the scan channel into the station, when an operator is encountered, pop two data out of the stack (pay attention to the order of the data), perform the corresponding operation and then push the result onto the stack, in sequence conduct

Application of stack in recursion: The key is to find the point that needs to be backtracked and save it with the stack, such as fn = fn-1 fn-2. To find fn, you need the results of fn-1 and fn-2. When searching for fn-1 I kept fn-2 before because I had to go back and find fn-2 later.

The application of queue in hierarchical traversal: first put a node into the queue, then take out a node from the queue in turn, and put its subsequent nodes into the queue; because the queue outbound order is the same as the inbound order, it ensures convenience The structure is hierarchical

The queue is used as a cache to ensure fairness

Arrays and special matrices

The storage structure of the array: row-first storage-loc(ai,j) = loc(a0,0) (i * number of columns j)*element size, column-first storage-loc(ai,j) = loc(a0,0) (j*number of rows i) * element size

Compressed storage of special matrices

Symmetric matrix - only need to save the upper or lower triangle

Tridiagonal matrix - holds the upper or lower triangle and a constant, requires n(n 1)/2 1 space

Tridiagonal matrix-ai,j is the (i-1) * 3 - 1 j - i 2 element

sparse matrix

Sparse matrices can be stored using triplet or cross-linked list methods.

string

string pattern matching

Simple pattern matching - time complexity O(NM), n is the length of the main string, m is the length of the pattern string

KMP algorithm

Prefix, suffix and partial match values for strings: Obtained by finding the longest equal suffix length of the current substring Partial matching value table: a(0), b(0), c(0), a(1), c(0) Number of moves = number of matched characters - value of partial matching table

Construction of next array: shift the partial matching table to the right and add one to each bit a(0), b(1), c(1), a(1), c(2) The meaning of the next array: When the j-th character of the substring fails to match the main string, the position of the substring next[j] is adjusted to the current position of the main string for comparison.

next array construction code

void get_next(String T, int next[]) { int i =0, int j = 0; next[i] = 0; while (i < T.length) { if(i == 0 || T.ch[i] == T.ch[j]){ i,j; next[i] = j; ｝else j = next[j]; ｝｝

nextVal array construction

void get_nextVal(String T, int nextVal[]) { int i = 0, j = 0; nextVal[i] = 0; while (i < T.length) { if (i == 0 || T.ch[i] == T.ch[j]) { i,j; if(T.ch[i] == T.ch[j])nextVal[i] = nextVal[j]; else nextVal[i] = j; }else j = nextVal[j]; ｝｝

Numbers and Binary Trees

Basic concepts of numbers

A number is a finite set of n nodes. When n=0, it is called an empty tree. The root node of the number has no predecessor node, and all nodes except the root node have one and only one predecessor. All nodes in the tree can have zero or more successors

basic terminology

The number of children of a node in a tree is called the degree of the node, and the maximum degree of a node in the tree is called the degree of the tree.

Nodes with degree greater than 0 are called branch nodes (non-terminal nodes)

The depth of a node is accumulated layer by layer starting from the root node and going from top to bottom. The height of the node is accumulated layer by layer starting from the leaf node and going upward. The height (depth) of a number is the maximum number of levels of nodes in the tree

Ordered numbers and unordered numbers. The nodes in the tree are ordered from left to right and cannot be interchanged. It is called an ordered tree.

path and path length. The path between two nodes in the tree is composed of the sequence of nodes experienced between the two nodes, and the path length is the number of edges experienced on the path

**Hereinafter, the left child node of the tree node is called the left node, and the right child node is called the right node.

properties of numbers

The number of nodes in the tree is equal to the sum of the degrees of all nodes plus one

The i-th level of a tree with degree m has at most m^(i-1) nodes.

An m-ary tree with height h has at most m^h-1/(m-1) nodes.

The minimum height of an m-ary tree with n nodes is log m n(m - 1) rounded up

Binary tree concept

Each node of a binary tree has at most two subtrees

Several special binary trees

full binary tree

A binary tree with height h and 2^h-1 nodes is called a full binary tree

complete binary tree

A binary tree of height h with n nodes, if and only if each node corresponds one-to-one to the nodes numbered 1 to n in the full binary tree of height h.

i<=[n/2] rounds down, then the node of i is a branch node, otherwise it is a leaf node

Leaf nodes can only appear in the two largest levels, and the leaf nodes are arranged on the left in the largest level.

If there is a node with degree 1, there can only be one

If n is an odd number, each node has left and right children. If n is an even number, only the last branch node has no right child.

Binary sorting number

The keywords of all nodes on the left subnumber are smaller than the keywords of the root node, and the keywords of all nodes on the right subnumber are greater than the keywords of the root node; the left subnumber and the right subnumber are each a binary sorting tree.

Properties of binary trees

The number of leaf nodes on the non-empty sub-book is equal to the number of nodes with degree 2 plus 1, that is, n0 = n2 1

The kth level of a non-empty binary tree has at most 2^(k-1) nodes.

A binary tree of height h has at most 2^h-1 nodes.

A complete binary tree is numbered from top to bottom and from left to right from 1 to n. When i>1, the parents are [i/2] When 2i<=n, the left child is 2i When 2i 1 <= n, the right child of node i is 2i 1

A complete binary tree with n nodes is defined as 2^(h-1) - 1 < n <= 2^h - 1

Binary tree storage structure

Sequential storage structure: Use a set of addresses to continuously store a single complete binary tree from top to bottom and from left to right. The first node usually exists at the position with index 1.

Chain structure: Each node contains three fields - data field, left pointer field, right pointer field

Binary tree traversal and clue binary trees

Binary tree traversal

Preorder traversal: Visit the root node, visit the left subtree, and visit the right subtree in sequence.

In-order traversal: Visit the left subtree, root node, and right subtree in sequence

Post-order traversal: sequentially visit the left subtree, right subtree, and root node

Convert the recursive algorithm to non-recursive: just use the stack to hold the nodes that need to be backtracked. Preorder and midorder are relatively simple. Postorder ensures that each node needs to be popped out of the stack twice.

Hierarchical convenience: after visiting a node again, the subsequent nodes of this node are enqueued.

clue binary tree

Utilize the null pointer field of the node (the left pointer points to the predecessor node, the right pointer points to the successor node), pointing to the direct predecessor or direct successor node

Construction of in-order clue binary tree: When traversing the binary tree in in-order, use pre to point to the predecessor node of the current node. If the left pointer of the current node is null, point the left pointer to the predecessor node; When the right pointer of the pre node is null, use the right pointer of the predecessor node to point to the current node.

In-order clue binary tree traversal: If the current node has a right subtree, the successor node is the lower-left node of the right subtree. If there is no right subtree, the successor node is the node pointed by the right pointer.

Preorder clue binary tree: If there is a left child, the left child is the successor node. If there is no left child and there is a right child, the right child is the successor. If it is a leaf node, the node pointed by the right pointer is the successor node.

Postorder clue binary tree: The convenience of the postorder clue binary tree requires the use of a stack or a ternary linked list storage with flag bits.

numbers, forest

number storage structure

Parent representation: use an array to hold the value of each node and the index of the parent in the array struct PTNode{ TYPE data, int parent; }; struct PTree { PTNode nodes[MAX_SIZE]; int n; };

Child brother representation: also known as binary tree representation, left subtree right brother struct CsNode { TYPE data; CSNode* firstchild, *nextsubling; };

Child representation: Link the children of each node with a singly linked list to form a linear structure

Conversion between numbers, forests and binary trees: the left pointer of each node points to its first child, and the right pointer points to its adjacent right brother in the tree. This rule is also called the left child and right brother.

Number and forest traversal

Root-first traversal: visit the root node first, and then visit the subtrees of the root node in sequence; the same as the pre-order traversal of the corresponding binary tree

Post-root traversal: first traverse each subtree of the root node in sequence, and then visit the root node; the same as the in-order traversal of the corresponding binary tree

Pre-order traversal of the forest: visit the root node in the first tree of the forest, pre-order traverse the sub-tree forest of the first tree, and pre-order traverse the remaining trees.

In-order traversal of the forest: In-order traverses the sub-tree forest of the first tree in the forest, visits the root node of the first tree, and in-order traverses the remaining forest after removing the first tree.

Applications of numbers and binary trees

Huffman numbers and Huffman coding

The definition of Huffman tree: The product of the path length from the root of the number to any node and the weight of the node is the weighted path length of the node. The sum of the weighted path lengths of all leaf nodes is the weighted path length of the tree. In a binary tree with n weighted leaf nodes, the binary tree with the smallest weighted path length is the Huffman tree.

The construction of Huffman tree: From the initial set of n leaf nodes, the two nodes with the smallest weights are selected, and the left and right nodes form a new node. The weight of the new node is the sum of the weights of the two nodes, and then the new node is Nodes are put into collections. After n-1 steps above, you can get

Huffman coding: It is a widely used data compression coding and also a prefix coding. Record the turn from the root node to the leaf node, 0 means turn left, 1 means turn right, so that the code of each leaf node can be obtained.

And search the collection

Utilizing the parent pointer array representation of a number as a union-find set

Can be used to process the merging of sets, determine whether there is intersection, etc.

picture

Basic concepts of graphs

The graph consists of a vertex set V and an edge set E, where the vertex set in the graph is a finite non-empty set

Directed graph: When E is a finite set of directed edges (also called arcs), the graph G is a directed graph, <v,w> v is called the tail of the arc, and w is called the head of the arc.

Undirected graph: When E is a finite set of undirected edges, the graph G is an undirected graph. (v, w) or (w, v) can be said that v and w are adjacent points to each other.

Simple graph, multigraph: There are no repeated edges, and there are no edges from vertices to itself. It is called a simple graph. If the number of edges between two vertices of graph G is greater than 1, and it is allowed to be related to itself through an edge, it is called a multigraph.

Complete graph (simple complete graph): For undirected graphs, an undirected graph with n(n-1)/2 edges is called a complete graph. For directed graphs, a directed graph with n (n-1) arcs is called a directed complete graph.

Subgraph: Given two graphs G=(V,E) and G`=(V`,E`), if V` is a subset of V and E` is a subset of E, then G` is subgraph of G. If V(G`) = V(G), it is called a generated subgraph of G. Note - not any subset of V and E can constitute a subgraph of G, since such a subset may not be a graph.

Connected, connected graphs and connected components: In an undirected graph, if there is a path from v to w, v and w are said to be connected. If any two vertices in the graph are connected, then the graph G is a connected graph, otherwise it is a non-connected graph. The maximal connected subgraph of an undirected graph is called a connected component.

Strongly connected graph, strongly connected component: In a directed graph, if v and w, v to w, and w to v all have paths, then these two vertices are said to be strongly connected. If any two vertices in the graph are strongly connected, then the graph is called a strongly connected graph. A maximal strongly connected subgraph of a directed graph is called a strongly connected component.

Spanning number, spanning forest: The spanning tree of a connected graph is a minimal connected subgraph that contains all vertices. In a non-connected graph, the spanning trees of connected components constitute the spanning forest of the non-connected graph.

Degree, in-degree and out-degree of a fixed point: In an undirected graph, the degree of a vertex v refers to the number of edges attached to the vertex v. The degree of all vertices in an undirected graph is twice the number of edges. In a directed graph, the degree of a vertex v is the sum of the out-degree and the in-degree. The out-degree is the number of edges with v at the end of the arc, and the in-degree is the number of edges with v at the end of the arc. The degree of all vertices of an undirected graph is equal to the number of edges.

Edge weight sum network: Each edge in the graph can be marked with a numerical value with a certain meaning, which is called the weight of the edge. This kind of graph with weighted edges is also called a network.

Dense graph, sparse graph: Generally, when the graph G satisfies E < VlogV, it can be regarded as a sparse graph.

Paths, path lengths and loops: A path between vertex v0 and fixed point vn is a sequence of specified points v0, v1, v2....vn. The number of edges on a path is called the path length.

Simple path and simple loop: A path whose vertices do not appear repeatedly is called a simple path. A circuit in which the vertices do not appear repeatedly except for the first vertex and the last vertex is called a simple circuit.

Distance: If the shortest path from vertex v to vertex w exists, then this path is called the distance from v to w. If it does not exist, it is recorded as ∞

Directed tree: A directed graph in which the in-degree of one vertex is 0 and the in-degrees of the other fixed vertices are all 1 is called a directed tree.

Storage and basic operations of graphs

adjacency matrix method

Use a one-dimensional array to store vertex information in the graph, and use a two-dimensional array to store edge information in the graph.

A[i][j] exists. For an undirected graph, it means that there is a path from vertex i to vertex j. For a directed graph, there is an edge from i to j.

The element A^n[i][j] of A^n is equal to the number of paths of length n from vertex i to vertex j.

The space complexity is O(N^2)

adjacency list method

Each vertex vi in the graph establishes a singly linked list, representing the edges attached to vi. This singly linked list is called vi's edge list. The head pointer and vertex data information of the edge table are stored sequentially (called a vertex table). Using the adjacency list method can save the waste of storage space when the graph is sparse.

To store a directed graph, the space complexity is O(V E). Storing undirected graphs, space complexity is O (V 2E)

cross list method

It is a chain storage structure of a directed graph. There is a node corresponding to each arc in the directed graph. The arc node adds a pointer field to the arc node pointing to the same arc node, and a pointer field to the arc node pointing to the same arc head node.

You can easily find all in-degree edges and out-degree edges of vi.

adjacency multiple list

It is another chain storage structure of undirected graph.

The only difference between an adjacency multilist and an adjacency list is that the same edge is represented by two nodes in the adjacency list, while there is only one node in the adjacency multilist.

Graph traversal

breadth first search

A level traversal algorithm similar to a binary tree. The basic idea is: start visiting from the first node v1, then visit the adjacent vertices v2v3v4... of v1, and then visit the adjacent nodes of v2. Do not visit again during the visit.

Performance analysis: For adjacency list storage, since each vertex is entered once, accessing the vertex is O(N), and the total time cost of finding adjacent vertices is O(E), so the time complexity is O(N E), and the space Complexity O(N) For adjacency matrix storage, since the time for each vertex to find adjacent vertices is O(N), the total time complexity is O(N^2) and the space complexity is O(N)

Breadth first generation number

depth first traversal

Similar to preorder traversal of a tree. The basic idea is: visit V1 first, then visit the neighboring node V2 of v1, then visit the neighboring node v3 of V2, etc. If there is no neighboring node, backtrack to find the next neighboring node of the previous node.

Performance analysis: For adjacency list storage, each vertex will be accessed once, and the cost of vertex access is O(N), so the total digestion of vertex search for adjacent nodes is O(E), so the time complexity is O(N E), and the space Complexity O(N) For adjacency matrix storage, the digestion of finding neighboring nodes for each vertex is O(N), so the time complexity is O(N^2) and the space complexity is O(N)

Depth-first generators and forests

Graph connectivity judgment: Graph traversal can be used to determine whether the graph is connected.

Application of diagrams

Minimum number of generations

For a weighted undirected graph G (V, E), T is the spanning number with the smallest sum of weights among all spanning trees, then T is called the minimum spanning tree of G.

prim algorithm

The time complexity is O(V^2) and does not depend on E, so it is suitable for solving the minimum spanning tree of dense graphs.

kruskal algorithm

Time complexity: O(ElogE), suitable for graphs with sparse edges and many vertices.

shortest path

Dijkstra's algorithm for single source shortest path problem

Time complexity: adjacency list-O (N^2), adjacency matrix-O (N^2)

Not applicable when there are negative weights on the edges

Floyd's algorithm for finding the shortest path between vertices

A^k[i][j] = min(A^k-1[i][j], A^k-1[i][k] A^k-1[k][j])

Time complexity: O(V^3)

Edges with negative weights are allowed in the graph, but cycles consisting of edges with negative weights are not allowed.

Directed acyclic graph description expression

If there is no cycle in a directed graph, it is called a directed acyclic graph DAG graph.

topological sort

AOV network: A DAG graph is used to represent a project. Its vertices represent activities. The directed edges <i,j> represent that activity vi must first be connected with activity vj. This directed graph is called a network in which vertices represent activities (AOV network).

Commonly used to check whether there is a cycle in a directed graph. Find dependencies

Critical Path

In a weighted directed graph, vertices represent events, directed edges represent activities, and the weights on the edges represent the cost of completing the activity. This is called a network in which edges represent activities (AOE network). The AOE network and the AOV network are both directed acyclic graphs. The difference is that their edges and vertices have different meanings. The edges in the AOE network have weights; while the edges in the AOV network have no weights, only Represents the relationship between vertices

The earliest occurrence time of event vk ve(k): ve(k) = max(ve[j] weight(j,k)), weight(j,k) is the weight of <j,k>

The latest occurrence time vl(k) of event vk: vl(k) = min(vl(j) - weight(k,j)), weight(k,j) is the weight of <k,j>

The earliest start time e(i) of activity ai: Edge <k,j> represents activity ai, e(i) = ve(k)

The latest start time of activity l(i): <k,j> represents activity l(i), then l(i) = vl(j) - weigh(k,j)

The difference between the latest activity start time l(I) and the earliest start time e(i): d(i) = l(i) - e(i), the key activity of the activity room with d(i)=0, according to the key activity Get the critical path.

If a critical activity is shortened to a certain extent, the critical activity may become a non-critical activity.

There are several critical paths. Only improving the key activities on one critical path cannot shorten the construction period of the entire project. Only by speeding up the key activities included on all critical paths can the construction period be shortened.

Find

Sequential search and binary search

Sequential search, also called linear search, is applicable to both sequential lists and linked lists.

Sequential search of a general linear table: the average number of comparisons for successful elements is (n 1)/2, and the average number of comparisons for unsuccessful elements is n

Sorted sequence table search: The average number of comparisons for a successful search is n(n 1)/2, and the average number of comparisons for an unsuccessful search: n/2

half search

It is called binary search, which is suitable for ordered sequence lists. It determines the position of the target element by comparing it with the middle element of the sequence, and then narrows the interval to gradually approach the target element.

Block search

Also known as index search, determine the block where the record to be searched is located in the index table. You can search sequentially or in half; search sequentially within a block.

When the number of blocks is root n, the average search length is the smallest.

tree search

Binary sorting number

It has the value of left subtree node < root node value < right subtree node value, so an in-order traversal of a binary sorted tree can obtain an increasing ordered sequence.

Deletion of a binary sorted tree: If deletion in a binary tree is between leaf nodes, otherwise find the predecessor node or successor node to replace itself, and then delete the predecessor node or successor node.

Analysis of search efficiency of binary sorted numbers: The search efficiency of binary search trees mainly depends on the height of the subtree. A balanced binary tree with a height difference of no more than 1 between the left and right sub-numbers has an average search length of O(log n)

balanced binary tree

To prevent the height of the tree from growing too fast, it is stipulated that when inserting and deleting binary tree nodes, it is necessary to ensure that the height difference between the left and right subtrees of any node does not exceed 1. Such a binary tree is called a balanced binary tree.

Insertion into a balanced binary tree: The LL type needs to be turned right after insertion, the RR type needs to be left-handed after insertion, the LR type needs to be turned left first and then right after insertion, and the RL type needs to be turned right first and then left after insertion.

Deletion of balanced binary tree: Deleting non-leaf nodes needs to be converted into deleting leaf nodes. The imbalance caused by deletion can be adjusted in the same way as insertion.

Searching a balanced binary tree: time complexity is O(log N)

Suppose Nh is used to represent the minimum number of nodes contained in a balanced binary tree with depth h. N0 = 0, N1=1, N2 = N1 N0 1, ...., Nh = Nh-1 Nh-2 1

Red and black numbers

nature

Each node is either red or black

The root node is black

Leaf nodes (fictitious external nodes, NULL nodes) are all black.

There are no two adjacent red nodes (that is, the parent node and child node of the red node are both black)

For each node, the simple path from this node to any leaf node contains the same number of black nodes.

in conclusion

The longest path from root to leaf node is not greater than 2 times the shortest path

The height of the red-black number with n internal nodes h<= 2log2 (n 1)

The newly inserted node with red and black numbers is initially colored red.

Insertion adjustment of red and black numbers

Mainly look at the uncle node of the newly inserted node

If the uncle node is a black node, then the node colors can be balanced by rotating the LL, LR, RR, and RL types and then exchanging the node colors.

The uncle node is a red node, set the parent node and uncle node to black, and set the grandparent node to red as a newly inserted node to iterate upwards

Deletion and adjustment of red and black numbers

Mainly look at deleting the sibling nodes of the position node

If the sibling node is a red node, do a left or right rotation and the sibling of x will be black.

If the brother is black and the left node of the sibling node is red, then rotate the sibling node left or right to get the sibling node to be black and the right node of the sibling to be red.

If the brother is black and the brother's right node is red, set the brother's right node to black, then rotate left with the parent node, and then swap the colors to set the self-generated to single black.

If the brother is black, and the left and right nodes of the brother are both black, then remove a layer of black from yourself and the brother, add a layer of black to the parent node, and start iterating from the parent node

B number and B number

B numbers and their basic operations

The maximum number of children of all nodes in the B-tree is called the order of the B-tree.

m-order B-tree properties

Each node in the tree has at most m subtrees, that is, it contains at most m-1 keywords.

If the root node is not a terminal node, there are at least two subtrees.

All non-leaf nodes except the root node have at least ceil(m/2) subtrees and contain at least ceil(m/) - 1 keyword. (*ceil() means rounding up, floor() means rounding down)

The structure of all non-leaf nodes is as follows: N| P0 | K1 | P1 | K2 | .... | Kn | Pn| N is the number of keywords in the node, Pi is the pointer to the root node of the subtree, and Ki is the keyword.

Height of B number (number of disk accesses)

Each node has at most m subtrees and m-1 keywords, so an m-order B-tree with height h has n <= m^h -1; so h >= logm n 1

If the number of keywords in each node is minimum: 1 node on the first level; 2 nodes on the second level; at least 2ceil(m/2) on the third level; h at least 2ceil(m/2)^ on level 1 (h-1) nodes. In a B-tree with n nodes, the unsuccessful node search (external node or NULL) is n 1, and there are n 1 >= 2ceil(m/2)^(h-1), that is, h <= log ceil(m/2 ) ((n 1)/2 1)

Insertion split of B-tree: Divide the keywords from the middle position ceil (m/2) into two parts. The left part contains the keywords and is placed in the original node. The right part contains the keywords and is placed in the new node. The middle position node Insert into the parent node of the original node

Deletion of B-tree: deletion of non-terminal nodes is converted into deletion of terminal nodes

If the number of keywords in the terminal node is greater than ceil(m/2) - 1, it will be deleted directly.

The number of terminal nodes is <=ceil(m/2)-1. If the number of adjacent node keywords is >=ceil(m/2), adjust the parent node and adjacent elements and borrow one

If the number of adjacent nodes is <= ceil(m/2) -1, a merge operation will occur. If the root node key is reduced to 1 and there are two subtrees, the two subtrees will be merged to become the new root.

Basic concepts of B-trees

Properties of m-order B-trees

Each branch node has at most m subtrees

The non-leaf root node has at least two subtrees, and each other branch node has at least ceil(m/2) subtrees.

The number of subtrees of the node is equal to the number of keywords

All leaf nodes contain all keywords and pointers to corresponding records. The keywords are sorted by size in the leaf nodes, and adjacent leaf nodes are linked to each other in order of size.

All branch nodes (indexes that can be regarded as indexes) only contain the maximum value of the keywords in each of its child nodes and pointers to its child nodes.

The main differences between m-order B-trees and m-order B-trees

In the B-tree, n keywords contain n sub-nodes, and in the B-tree, a node with n keywords contains n sub-trees.

In the B number, the range of the number of keywords in each node (non-root internal node) is ceil(m/2) <= n <= m (root node: 2 <= n <= m) The number of keywords for each node (non-root internal node) in the B-tree ceil(m/2) - 1 <= n <= m -1, (root node: 1 <= n <= m - 1)

In the B-tree, leaf nodes contain information, and all non-leaf nodes only serve as indexes. The index items of non-leaf nodes only have the largest keyword for the subtree and the pointer to the subtree, and do not include the storage of the record corresponding to the keyword. address.

In the B-tree, leaf nodes contain all keywords, that is, keywords that appear in non-leaf nodes will also appear in leaf nodes.

Each search of B is a path from the root node to the leaf node.

hash table

The hash table establishes a direct mapping relationship between keywords and storage addresses.

How to build a hash function

Direct addressing method: H (key) = key or H (key) = a*key b

Division with remainder method: H(key) = key % p

Numeric analysis method: converting base to decimal

Square-Medium Method

How to handle conflicts

open addressing method

Linear detection method: Hi = (H (key) di) % m

square detection

Double hashing method: Hi = (H(key) i * Hash2(key)) % m

pseudorandom sequence method

zipper method

Hash table performance analysis

The efficiency of hash table search depends on three factors: hash function, processing function, collision handling method and filling factor Fill factor: a = number of records in the table / length of the hash table

sort

Basic concepts of sorting

Sorting, the process of rearranging the elements in the table so that the elements in the table satisfy the keyword ordering

Algorithm stability: There are two elements Ri and Rj in the list to be sorted. Their corresponding keywords are the same, that is, keyi=keyj, and Ri is in front of Rj. After sorting, Ri is still in front of Rj. Then the sorting algorithm is stable, otherwise it is unstable.

insertion sort

Insert the elements to be sorted into the appropriate position in the sorted sequence

Time complexity O(n^2), space complexity O(1), O(n) in the best case and O(n^2) in the worst case

Each time an element is inserted, it is compared and moved from back to front. It is a stable sorting algorithm.

half insertion sort

Finding a suitable position for insertion through bisection reduces the number of comparisons, but the number of moves does not change, and the time complexity remains O(n^2)

Hill sort

Divide the table to be sorted into several sub-tables, perform insertion sort on each sub-table, and perform insertion sort again after the elements in the entire table are basically in order.

Time complexity O(n^1.3) Worst case O(n^2) Space complexity O(1)

It is an unstable sorting algorithm

swap sort

Bubble Sort

Time complexity O(n^2) Best case O(n) Worst case O(n^2)

It is a stable sorting algorithm

Quick sort

Time complexity O(nlog n) Best case O(nlogn) Worst case O(n^2) Space complexity O(log n) Worst case O(n)

It is an unstable sorting algorithm

selection sort

Simple selection sort

Time complexity O(n^2), best and worst case O(n^2), space complexity O(1)

It is an unstable sorting algorithm

Heap sort

The adjustment time of the heap is O(log n) depending on the number height. When initializing a heap of n elements, the total number of keyword comparisons does not exceed 4n, and the initialization time complexity is O(N).

Time complexity O(nlogn) Space complexity O(1)

It is an unstable sorting algorithm

Merge sort and radix sort

merge sort

Time complexity O(nlog n) Space complexity O(n)

It is a stable sorting algorithm

Radix sort

A method to sort single logical keywords based on the idea of multi-keyword photography

Time complexity O(d(n r)) d is the number of bits, n is the number of keywords, and r is the number of queues

It is a stable sorting algorithm

Comparison and application of various internal sorting algorithms

From the perspective of time complexity: simple selection sort, direct insertion sort, bubble sort - O (n^2) Direct insertion sort, bubble sort - best case O(n) Heap sort, quick sort, merge sort - O(nlog n)

From the perspective of space complexity: simple selection sort, insertion sort, bubble sort, Hill sort, heap sort - O (1) Quick sort, merge sort - best case O (log N) worst case O (n) Merge sort - O(n)

From the perspective of stability: insertion sort, bubble sort, merge sort, radix sort - stable sorting algorithms Simple selection sort, quick sort, Hill sort, heap sort - unstable sorting algorithms

Factors to consider when choosing a sorting method: Number of elements to be sorted n The amount of information in the element itself The organization and distribution of the keyword itself stability requirements Language tools, storage structures, size of auxiliary space

external sort

External sorting method: Usually the merge sorting method is used, which consists of two stages. The first is to sort the merged segments, and the second is to merge the segments one by one. The total time of external sorting = the time required for internal sorting, the time required for reading and writing external storage information, the time required for internal merging

Multi-way merge and loser tree

Loser tree: Record the losers in the left and right subtrees through branch nodes, and let the winners continue to compare upwards. The depth of the k-way loser tree is ceil(log k), so selecting the smallest keyword from k records requires at most ceil(log k) comparisons.

Replacement-selection sort (generating initial merge segments)

The length of each initial merge segment obtained by using the internal sorting method is the same (except for the last segment), which depends on the size of the available memory work area during internal sorting. Longer initial merge segments can be generated using a substitution-selection algorithm.

Suppose an initial merge segment is s = {}, workspace w = {}, input file F = {} Description of an initial segment generation algorithm: The initial s is empty, w is the work area size; while ai = min(w) and (ai >= max(s)): //Find the smallest element ai in w that is greater than or equal to all elements in s s = s ai //ai joins the initial segment bi = get(f) //Read elements sequentially from the input file w = w - ai bi //Put it into the workspace

Optimal number of merges

The number of elements in the merge segment is different. During the merge, the Huffman tree can be used to find the best merge plan.

The k-ary Huffman tree satisfies: no = (k - 1)nk 1, where nk is the number of nodes with degree k, and n0 is the number of nodes with degree 0

Add virtual ends: The initial number of segments may not be exactly the optimal number of merges, so it is necessary to add 0 virtual segments. nk = (n0 - 1)/(k -1) where nk must be an integer, and n0 is the initial segment r and the number of virtual segments u that needs to be filled in. So the imaginary end to be added is u=(k - 1) - (r - 1)/(k - 1), and u is a number less than k-1.