MindMap Gallery data structure
An introduction to data structure knowledge, including the basic terminology of data structure, three elements of data structure, and three parts of algorithm. You can read it if you need it.
Edited at 2022-09-07 11:22:35Avatar 3 centers on the Sully family, showcasing the internal rift caused by the sacrifice of their eldest son, and their alliance with other tribes on Pandora against the external conflict of the Ashbringers, who adhere to the philosophy of fire and are allied with humans. It explores the grand themes of family, faith, and survival.
This article discusses the Easter eggs and homages in Zootopia 2 that you may have discovered. The main content includes: character and archetype Easter eggs, cinematic universe crossover Easter eggs, animal ecology and behavior references, symbol and metaphor Easter eggs, social satire and brand allusions, and emotional storylines and sequel foreshadowing.
[Zootopia Character Relationship Chart] The idealistic rabbit police officer Judy and the cynical fox conman Nick form a charmingly contrasting duo, rising from street hustlers to become Zootopia police officers!
Avatar 3 centers on the Sully family, showcasing the internal rift caused by the sacrifice of their eldest son, and their alliance with other tribes on Pandora against the external conflict of the Ashbringers, who adhere to the philosophy of fire and are allied with humans. It explores the grand themes of family, faith, and survival.
This article discusses the Easter eggs and homages in Zootopia 2 that you may have discovered. The main content includes: character and archetype Easter eggs, cinematic universe crossover Easter eggs, animal ecology and behavior references, symbol and metaphor Easter eggs, social satire and brand allusions, and emotional storylines and sequel foreshadowing.
[Zootopia Character Relationship Chart] The idealistic rabbit police officer Judy and the cynical fox conman Nick form a charmingly contrasting duo, rising from street hustlers to become Zootopia police officers!
introduction
basic terminology
data
A carrier of information, a collection of information stored in a computer
data element
basic unit of data
Composed of data items, the data item is its smallest indivisible unit.
data object
A collection of data elements with the same properties, a subset of data
type of data
A collection of values and a set of operations defined on this collection.
Atomic type: its value cannot be subdivided
Structural type: a data type that can be further decomposed into several components
Abstract data types: abstract data organization and operations related to it
data structure
A collection of data elements that have one or more specific relationships with each other
three types
logical structure
storage structure
Data operations
Three elements of data structure
logical structure
storage structure
sequential storage
logically and physically adjacent
Advantages: Implement random access
Disadvantage: Only an adjacent block of storage units can be used
chain storage
Logically adjacent, not necessarily physically adjacent. Use pointers to represent logical relationships.
Advantages: No fragmentation, full use of storage units
Disadvantages: Storing pointers takes up additional storage space and can only be accessed sequentially
Index storage
Stores element information and additional index tables.
Advantages: fast retrieval speed
Disadvantages: The additional index table takes up additional storage space, and it takes more time to modify the index table when inserting or deleting elements.
Hash storage
The storage address of the element is directly calculated based on the element's keyword, also known as hash storage.
Advantages: Data retrieval, addition and deletion are faster
Disadvantages: Conflicts in element storage units may occur, increasing time and space overhead.
Data operations
Operations applied to data include the implementation and definition of the operation
The definition is for logical structure and operation function.
Implement specific operation steps for storage structures and calculations
algorithm
A description of the steps to solve a specific problem, a finite sequence of instructions, each instruction representing one or more operations
characteristic
Finiteness
certainty
feasibility
enter
output
Achieve goals
correctness
readability
Robustness
Efficiency and low storage requirements
efficiency
time complexity
The number of times the statement is executed repeatedly
space complexity
Storage space consumed by the algorithm
linear table
basic definition
A finite sequence of n data elements of the same data type
Basic features
Each element except the first element has one and only one direct predecessor
Each element except the last element has exactly one direct successor
A linear table is a logical structure
sequential storage
Also called a sequence table, it is a storage structure in which data are logically and physically adjacent.
define structure
static structure
dynamic structure
insert operation
Element movement
Best: O(1)
Worst: O(n)
Average number of node moves: n/2
Insertion time complexity is O(n)
Delete operation
Element movement
Best: O(1)
Worst: O(n)
Average number of node moves: (n-1)/2
The time complexity of deletion is O(n)
Find by value
Element movement
Best: O(1)
Worst: O(n)
Average number of node moves: (n 1)/2
The time complexity of searching by value is O(n)
chain storage
The data is connected together through pointers, and sequential access cannot be performed. The pointer can be modified when inserting and deleting. Also called a linked list, it is a storage structure
Single list
Store data elements in a linear table through an arbitrary set of storage units. Each linked list node also needs to store a pointer to the next node.
Advantages: Solve the fragmentation problem caused by continuous data storage
Disadvantages: There is a waste of space in the storage pointer field. Discrete data storage is a non-random access storage structure. To find a specific node, you need to start from the table header.
Head node: The node attached before the first node, the data field does not store information. (1) The operation on the first node is consistent with other nodes. (2) The operations on empty tables and non-empty tables are unified
Head pointer: The pointer pointing to the first node. When it is NULL, the linear list is an empty list. Regardless of whether there is a head node, the head pointer points to the first node of the linked list.
Operational realization
Head insertion method to create singly linked list
Insert new node into linked list header
Time complexity O(n)
Create singly linked list using tail insertion method
When a new node is inserted into the end of the linked list, a new tail pointer needs to be added to always point to the tail node.
Time complexity O(n)
Find node value by serial number
Starting from the first node, search along the next domain until the i-th node is found.
Time complexity O(n)
Find table nodes by value
Starting from the first node, search along the data field until the required value
Time complexity O(n)
insert node
Insert the value of the corresponding data into the corresponding i-th node position.
Time complexity O(n)
delete node
Delete the i-th node at the corresponding position
Time complexity O(n)
Ask for table length
Count the number of data nodes in a singly linked list.
Time complexity O(n)
Double linked list
A linked list containing two pointer fields. Prior executes the predecessor node and next points to the successor node.
Operational realization
The rest of the operations are similar except insertion and deletion.
insert
Time complexity 0(1)
delete
Time complexity 0(1)
circular linked list
Circular singly linked list
The pointer to the last node in the table executes the data field with the node
Insertion and deletion are similar to singly linked lists, but there is no need to judge whether it is the end of the list.
Set the tail pointer so that the time complexity of operating the head and tail of the table is O(1)
Circular doubly linked list
The priority pointer of the head node needs to point to the end node of the table. When the table is empty, the priority and next of the head node are both equal to L.
static linked list
An array is used to describe the linked storage structure of a linear list, but the pointer field of the node is the subscript of the index array.
Use requires pre-allocation of a contiguous storage space
The end flag is next==1
Comparison between sequence list and linked list
Reading and writing methods
Sequence table
Sequential access, random access
linked list
Elements can only be accessed sequentially at the head of the table
Logical structure and physical structure
Sequence table
Logically adjacent, physically adjacent
linked list
Logically adjacent, not necessarily physically adjacent
Find, insert and delete
Sequence table
Find by value
Unordered O(n)
OrderedO(log2n)
Search by serial number
O(1)
insert delete
Move half the table length elements on average
linked list
Search by serial number
O(n)
insert delete
Modify the corresponding node pointer field
space allocation
Sequence table
static
It cannot be expanded once it is full, and larger space needs to be pre-allocated. Too large wastes space; too small causes overflow.
dynamic
It can be expanded, but the operation efficiency is low. If there is no larger continuous storage space, the allocation will fail.
linked list
Convenient storage space allocation, flexible and efficient operation
stack,queue
stack
definition
A linear table that only allows insertion or deletion in a section, first in, last out
Top of stack: a section for insertion and deletion
Bottom of the stack: the other end that is fixed and does not allow insertion and deletion
n different elements are pushed onto the stack, and the number of different arrangements of the elements that appear is
sequential storage
A sequentially stored stack uses a set of storage units with consecutive addresses to store data elements from the bottom of the stack to the top of the stack, and sets a pointer to indicate the current position of the top element of the stack.
The stack top pointer S.top is initially set to S.top=-1
Top element of stack: S.data[S.top]
Push to the stack: When it is not satisfied, the top pointer of the stack is 1, and then the value is sent to the top pointer of the stack.
Outbound: When it is not empty, first take the value of the top element of the stack, and then -1 the top pointer of the stack.
Stack empty judgment condition: S.top==-1
Stack full condition: S.top==MaxSize-1
Stack length: S.top 1
shared stack
Two sequential stacks share a one-dimensional array space. The bottoms of the stacks are set at both ends of the shared space and extend toward the middle.
Storage space can be used more effectively, and the time complexity of accessing data is O(1).
chain storage
A stack using chained storage is called a chain stack.
Advantages: It is convenient for multiple stacks to share storage space and improve efficiency, and there is no stack overflow situation.
The chain stack has no head node, and Lhead points to the top element of the stack.
It facilitates the insertion and deletion of nodes. The operation is similar to that of a linked list. Pushing and exiting are performed at the header of the table.
queue
It is also a linear table with limited operations, which only allows insertion at one end and deletion at the other end.
Operating characteristics: first in, first out
Basic Information
Head of line: a section that is allowed to be deleted
Tail of the queue: a section that allows insertion
Empty queue: an empty list without any elements
sequential storage
Allocate a continuous address unit to store the data in the queue, with a head pointer front and a tail pointer rear.
Team empty (initial state): Q.front==Q.rear==0
Enter the queue: When dissatisfied, send the value to the end of the queue, and the end of the queue pointer is 1
Dequeue: When it is not empty, take the head element of the team and the head pointer 1
There will be false overflows.
circular queue
Think of the queue as a ring.
Initial: Q.front=Q.rear=0
The front pointer of the team advances by 1: Q.front=(Q.front 1)%MaxSize
The queue tail pointer advances by 1: Q.rear=(Q.rear 1)%MaxSize
Queue length: (Q.rear MaxSize-Q.front)%MaxSize
When leaving the queue or entering the queue, the pointer advances by 1 clockwise.
Judging that the team is full and empty
Sacrifice a storage unit
When the queue head pointer is at the next position of the queue tail pointer, it is used as a sign that the queue is full.
The team is full: (Q.rear 1)%MaxSize==Q.front
Team empty: Q.front==Q.rear
Number of elements: (Q.rear-Q.front MaxSize)%MaxSize
Add element count data member
Team empty: Q.size==0
Team is full: Q.size==MaxSize
Add tag data element
tag=0, deletion causes Q.front==Q.rear, the queue is empty
tag=1, insertion causes Q.front==Q.rear, the queue is full
chain storage
The chain representation of the queue is a chain queue, a singly linked list with a head pointer and a tail pointer.
Q.front==Q.rear==NULL, the queue is empty
It is often set as a singly linked list with the head node, which can simplify the insertion and deletion operations of the queue.
Using a chain structure to store queues avoids the problems of unreasonable storage allocation and "overflow"
deque
A queue that can perform enqueue and dequeue operations on both ends of the queue.
Enqueuing operation: The elements enqueued on the front end are arranged in front of the elements enqueued on the back end.
Output restricted deque: one end allows insertions and deletions, the other only allows insertions
Input restricted deque: one end allows insertions and deletions, the other only allows deletions
application
stack application
bracket matching
(1) Set an empty stack and read parentheses sequentially
(2) Left parenthesis: pushed onto the stack as a new, more urgent expectation
(3) Right bracket: eliminates the most urgent expectation, or is illegal
Expression evaluation
Scan the expression, push the operands into the stack, scan to the operator op, exit the two operands Y and X from the stack, calculate X opY, push the calculation result into the stack, and continue scanning until the scan is completed.
Fibonacci Sequence
queue application
Level traversal
(1) Root node joins the queue
(2) If the team is empty, end the traversal, otherwise repeat (3)
(3) Enter the first node into the queue and visit it. If there is a left child, the left child joins the team; if there is a right child, the right child joins the team. return(2)
computer system
Use the queue as a data buffer to solve the problem of data mismatch between the host and external devices
Use queues to store various request information to solve resource competition problems caused by multiple users
array
definition
An array is a finite sequence of n identical data types.
Arrays are a generalization of linear tables. Once an array is defined, its dimensions and bounds will not change.
storage structure
row-major storage
column-major storage
special matrix
Symmetric matrix
triangular matrix
Lower triangle data storage
Upper triangle data storage
tridiagonal matrix
sparse matrix
Store the non-zero elements in the matrix in the order of their rows, columns and corresponding values, triples
string
basic concept
String: A finite sequence of zero or more characters. The content in the sequence can be numbers, letters and other characters.
The length of the string is n. When n=0, it is an empty string.
String: a subsequence of any number of consecutive characters in the string
Main string: string containing strings
Location
The serial number of a certain character in a string
The position of the string in the main string is represented by the position of the first character of the string in the main string.
Space string: a string consisting of one or more spaces
storage
Fixed-length sequential storage
Allocate a fixed-length storage area for each string variable, that is, a fixed-length array
string length representation
Set an additional variable to store the length of the string
Add an end mark character "\0" after the string that is not included in the length of the string.
Heap allocated storage
It is a group of storage units with consecutive addresses, but their storage space is dynamically allocated during program execution.
In use, malloc() and free() functions are usually used to achieve dynamic management.
Blockchain storage
Use linked list to store string values
You can use one node to store one character, or one node to store multiple characters.
Each node is called a block, and the entire linked list is called a block chain structure.
pattern matching algorithm
Simple pattern matching algorithm
When the match is unsuccessful, the main string pointer returns to the second character. Returns the string pointer to the first character
Time complexity O(mn)
KMP algorithm
Calculate the value of the corresponding next array of the string
Determine the movement of the string when a matching error occurs based on the value of the next array.
Trees and Binary Trees
Tree
definition
A finite set of n nodes. When n=0, it is an empty tree.
There is only one specific root node
When n>1, the remaining nodes can be divided into m disjoint finite sets. Each set itself is a tree and a subtree of the root.
Features
The root node has no predecessor, and all nodes except the root node have one and only one predecessor.
All nodes can have zero or more successors
the term
ancestors, descendants, parents, children, brothers
Spend
The number of children of the node
The maximum degree of a node is the degree of the tree
Branch node: node with degree>0
Leaf node: node with degree=0
Node depth: starting from the root node and accumulating layer by layer from top to top.
Node height: starting from leaf nodes and accumulating layer by layer from bottom to top.
Ordered tree: the left and right order of nodes cannot be interchanged
Unordered tree: the left and right order of nodes can be interchanged
Path: consists of a sequence of nodes passing between two nodes. The number of nodes is the length of the path.
nature
Number of nodes = degree and 1
storage structure
Parents expressed
A set of continuous spaces is used to store each node, and a pseudo pointer is added to each node to represent the position of the node's parents in the array.
You can quickly find the parent nodes of each node, but you need to traverse the entire structure to find the children of the node.
child expresses
The children of each node are connected into a linear structure using a singly linked list.
Finding children is very simple, but the operation of finding parents requires traversing the n child linked lists pointed to by the child node pointer fields in the n linked lists.
Brothers and children express
Also called binary tree representation. Each node consists of three parts: the node value, the node's pointer to the first child node, and the pointer to the next sibling node.
It is more complicated to find the parent node in this structure. You can execute its parent node by adding a parent field.
Traverse
Tree
first root
When accessing the root node, traversing the subtree is also in the order of root first and then node.
posterior root
First traverse the subtree and then traverse the root. The smaller subtrees are also processed in this way.
forest
precedence
The root node of the first tree
Pre-order traverse the first tree root node subtree
Preorder traverse the forest consisting of subtrees except the first subtree
mid-order
In-order traverse the root node of the first subtree subtree forest
The first tree root node
Inorder traverse the forest consisting of the remaining subtrees except the first subtree
Tree, forest and binary tree traversal correspondence
Binary tree
definition
Each node has at most two subtrees, and the subtrees can be divided into left and right subtrees, and their order cannot be reversed. Unlike a tree of degree 2
special binary tree
full binary tree
complete binary tree
The last level of the unsatisfied binary tree has only one node with degree 1 and only a left child and no right child.
Binary sorting tree
The keys of all nodes on the left subtree are less than the keys of the root node, and the keys of all the nodes on the right subtree are greater than
balanced binary tree
The depth difference between the left subtree and the right subtree of any node in the tree does not exceed 1
nature
complete binary tree
storage structure
order
Use a set of continuous storage units to store the nodes on the binary tree from top to bottom and from left to right at a time, using 0 to represent non-existent empty nodes.
chain
The binary linked list contains three fields: data field, left pointer field lchild, and right pointer field rchild.
In a binary linked list of n nodes, there are n 1 empty link fields.
Traverse
precedence
Root, left subtree, right subtree
mid-order
left subtree, root, right subtree
Afterword
left subtree, right subtree, root
level
Visit each node in order from top to bottom, left to right
clue binary tree
concept
It is to arrange the nodes in the binary tree into a linear sequence according to certain rules.
In the form of a linked list, when the flag field is 0, it indicates the left child, and when the flag field is 1, it indicates the right child.
Clue binary tree construction
Threading is to change the null pointer in the linked list to a clue for executing the predecessor or successor. The essence is to traverse the binary tree once
mid-order
First order, last order
tree forest binary tree
Convert tree to binary tree
Rule: Left child, right brother
Art style
Add a line between sibling nodes
Each node only retains the connection to the first child
Take the root of the tree as the axis and rotate it 45 degrees clockwise
Convert forest to binary tree
Convert each number into the corresponding binary tree
The roots of each tree are managed as brothers, and a connection line is added between the tree roots.
Rotate 45 degrees clockwise around the root of the first tree
Huffman tree
The nodes of the tree are assigned a numerical value that represents a certain meaning, which is called the weight of the node.
Weighted path length: the product of the path length from the root of the tree to any node and the weight of the node
Among the binary trees containing n weighted leaf nodes, the one with the smallest weighted path length is called a Huffman tree.
Construction method
Among the node values, select the two minimum values as the left and right subtrees of the new node, and the sum of their weights is used as the value of the root node.
Choose the two smallest among the new weights as the next left and right children.
Features
Each initial node eventually becomes a leaf node. The smaller the weight, the greater the path length from the node to the root node.
A total of n-1 new nodes are created, and the total number of nodes is 2n-1.
Each construction selects 2 trees as the children of the new node, and there is no node with degree 1.
coding
Each edge is encoded as 1 on the left and 0 on the right, or 1 on the right and 0 on the left.
encoding for prefix: no encoding is the prefix of another encoding
picture
definition
Composed of vertex set V and edge set E
It cannot be an empty graph. There cannot be a vertex in the graph, but it can have no edges.
basic concept
directed graph
An edge is an ordered pair of vertices, represented by <v,w>
Strong connectivity: There are opposite connected paths between two vertices
Strongly connected graph: every pair of vertices is strongly connected
Strongly connected components: maximal strongly connected subgraphs in directed graphs
Undirected graph
Edges are unordered pairs of vertices, represented by (v,w)
Connected: There is a path between two vertices
Connected graph: Any two fixed points are connected, otherwise it is a disconnected graph.
Connected components: maximal connected subgraphs in undirected graphs
Simple graph: There are no duplicate edges and no edges from vertices to itself.
Multigraph: The number of edges between two vertices in the graph is greater than 1, and the vertices are allowed to be associated with themselves through an edge.
complete graph
For an undirected graph, there is an edge between any two vertices.
For a directed graph, there are two arcs in opposite directions between any two vertices.
Sub-picture:
Spanning tree: It is a minimal connected subgraph that contains all the vertices in the graph.
Spanning forest: In a non-connected graph, it is composed of spanning trees of connected components.
Spend
The degree of a vertex in an undirected graph is related to the number of its edges.
directed graph
In-degree: vertex as end point
Out-degree: starting from the vertex
Degree = in-degree out-degree,
Edge weight and network: Each edge can be marked with a value with a certain meaning, called a weight. This graph with weights on the edges is called a weighted graph, also called a network.
A graph with a small number of edges is called a sparse graph, and vice versa is a dense graph. The difference between the two is vague and relative
Path: a sequence of vertices connecting two vertices
Path length: the number of vertices in the vertex sequence
Circuit: a path where the first vertex and the last vertex are the same
Simple loop: There are no repeated vertices in the path sequence. A loop in which all vertices except the last vertex and the first vertex are different is a simple loop.
Distance: the shortest path from vertex 1 to vertex 2. If there is no shortest path, the distance is recorded as infinity.
Directed tree: a directed graph in which the in-degree of one vertex is zero and the in-degree of the other vertices is 1.
storage
adjacency matrix
Use a one-dimensional array to store the vertex information in the graph, and use a two-dimensional array to store the edge information in the graph.
Picture without authority
weighted graph
Features
The adjacency matrix of an undirected graph is symmetrical, and compressed storage can be used for larger scales.
Space complexity O(n)
directed graph
The number of non-zero elements in the i-th row out of degree
The number of non-zero elements in the i-th column of in-degree
Suitable for representing dense graphs
adjacency list
Create a singly linked list for each vertex in the graph. The nodes in the singly linked list represent the edges attached to the vertex. This singly linked list is called the edge list of the vertex. The head pointer of the edge table and the data information of the vertices are stored in order
Features
storage
Undirected graph:
Directed graph:
Suitable for representing sparse graphs
directed graph
Out-degree: the number of data elements in a singly linked list
In-degree: traverse the entire adjacency list
Indicates that it is not unique. The connection order in the singly linked list corresponding to the node can be interchanged.
cross linked list
A chain storage structure of directed graph
arc node
tailvex: The position of the arc tail in the picture
headvex: The position of the arc head in the picture
hlink: points to the next arc with the same arc head
tlink: points to the next arc with the same arc tail
vertex node
data: Vertex-related data information
firstin: the first arc node whose vertex is the arc head
firstout: the first arc node whose vertex is the tail of the arc
Vertex nodes are stored sequentially
adjacency multiple list
Chain storage structure of undirected graph
edge node
mark: mark whether the edge has been searched
ivex jvex: the position of the two vertices attached to the edge in the graph
ilink: the next edge attached to the vertex ivex
jlink: the next edge attached to the vertex jvex
The same edge is represented by only one node and two vertices in the adjacency list.
Traverse
Breadth First Search (BFS)
Starting from a certain vertex, the access from near to far is connected to the vertex by a path and the path length is 1, 2,...n
It is a hierarchical search process that uses an auxiliary queue to store the next layer of vertices of the vertex being accessed.
Performance analysis
space complexity
time complexity
adjacency list
adjacency matrix
Find the shortest path
Find the shortest path between two vertices of a non-weighted graph,
breadth-first spanning tree
Breadth-first spanning tree of adjacency matrix is unique
The breadth-first spanning tree of the adjacency list is not unique
Depth First Search (DFS)
Visit a certain starting vertex v in the graph, then start from v and visit any unvisited vertex w adjacent to v, then visit any unvisited vertex q adjacent to w, and repeat the process.
Performance analysis
Space complexity:
time complexity
Adjacency matrix:
Adjacency list:
depth first spanning tree
The adjacency list stores non-unique
Traversal and connectivity
Undirected graph
Connected, starting from any node, you can visit all vertices only once
Non-connected, one traversal can only access all the vertices of the connected component where the vertex is located.
Directed graph: If there is a path from the initial point to every vertex in the graph, all vertices in the graph can be accessed, otherwise all vertices cannot be accessed.
Related applications
minimum spanning tree
For weighted connected undirected graphs, the spanning trees are different, and the weight of each tree may also be different. Let W be the set of all spanning trees in the graph, then among the sets, the tree with the smallest sum of weights is called the minimum spanning tree of the graph.
nature
Not unique, for an undirected connected graph edge 1 = vertex, the minimum spanning tree is itself
The sum of weights is unique
Edge 1=Vertex
Prim's algorithm
Initially, select any vertex to add to the tree, select the vertex closest to it to add to the set, and repeat
time complexity:
Suitable for graphs with dense edges
Kruskal algorithm
A method to construct a minimum tree by selecting appropriate edges in order of increasing weights
time complexity:
A graph with sparse edges and many vertices
shortest path
In a weighted graph, the sum of paths from one vertex to another vertex is defined as the weighted path length of the modified path.
The solution depends on: the shortest path between two points contains the shortest paths between other vertices on the path
Dijkstra's algorithm
Solve the shortest path from a single source, that is, the shortest path from a certain vertex to other vertices
Set up a set S, record the vertices of the shortest path that have been found, and put the source point v0 in the set S. Every time Vi is incorporated into the set, the source point must be modified into the set.
Auxiliary array
dist[]: record the shortest path length from source point v0 to other points
path[]: the predecessor node of the shortest path from the source point to vertex i
time complexity
Adjacency matrix and weighted adjacency list:
Not applicable to graphs with negative values on edges
Floyd algorithm
Find the shortest path between each pair of vertices
Implementation: It is actually an iterative process. An n-order square matrix sequence is generated through recursion. Each time, one more vertex is considered on the shortest path between vertices i and j. After n iterations, we can get The shortest path between i and j
time complexity
Edges with negative weights are allowed, but cycles containing edges with negative weights are not allowed.
Directed acyclic graph description expression
Directed acyclic graph: There is no cycle in a directed graph
Share the same subexpressions in expressions to save storage space
topological sort
AOV network: A DAG graph is used to represent a project, and its vertices represent activities. The directed edge between two vertices must be active before the end. This directed graph is called a network in which the vertices represent activities.
definition
A sequence of vertices of a directed acyclic graph
So that if there is a path from vertex B to vertex A, then vertex A appears behind vertex B in the reordering
fulfil requirements
Each vertex appears only once
If A is in front of B in the topological sequence, there is no edge from B to A in the graph.
sort by
Select a vertex without a predecessor and output
Delete the vertex and all directed edges starting from it
Repeat the first two steps until there are no vertices in the network. If there are always points, it means there is a cycle in the graph.
time complexity
Adjacency list:
Adjacency matrix:
reverse topological sort
Select vertices with no successors (out-degree 0) and output
Delete the vertex and all directed edges ending with it
Critical Path
AOE network
In a weighted directed graph, events are represented by vertices, activities are represented by directed edges, and the cost of completing the activity (the time required to complete the activity) is represented by the weight on the edge. This is called a network that uses edges to represent activities.
nature
Only after the time represented by a vertex occurs, the activity represented by the directed edge starting from the vertex can begin.
Only when the activities represented by the directed edges entering a vertex have ended, the event represented by the vertex can occur.
In this graph, there is only one vertex with an in-degree of 0, which is called the start vertex (source point) and represents the beginning of the entire project. There is only one vertex with an out-degree of 0, which is called the end vertex (sink) and represents the end of the entire project.
key activities
Among all paths from the source to the sink, the path with the largest path length is called the critical path, and the activities on the critical path are called critical activities.
The shortest time to complete the entire project is the length of the critical path, the time spent on key activities on the critical path.
Key activities affect the time of the entire project. If the key activities cannot be completed on time, the completion time of the entire project will be extended. As long as the key activities are found, the critical path can be found, and the shortest completion time can be obtained.
Parameter definition
The earliest occurrence time of event Vk ve(k)
The longest path length from the source point to the vertex Vk. The earliest occurrence time of Vk determines the earliest time that all activities starting from Vk can start.
The calculation of ve() is performed from front to back, and can be performed in the order of topological structure.
The latest occurrence time vl(k) of event Vk
Under the premise of not delaying the completion of the entire project, it is guaranteed that its subsequent event Vj can occur at its latest occurrence time vl(j), the latest time that the event must occur
vl(sinking point)=ve(sinking point)
When calculating this value, it is necessary to add a stack record topology sequence
The earliest start time e(i) of activity ai
The earliest occurrence time of the time represented by the starting point of the activity arc
The latest start time l(i) of activity ai
The difference between the latest occurrence time of the event represented by the end point of the activity arc and the time required for the activity
The difference between the latest start time and the earliest start time of the activity
The margin of activity completion time, that is, the overall completion time of the project remains unchanged, and the activity can be delayed by the time
If the time margin of an activity is zero, it means that the activity must be completed on time, and the activity is a critical activity
Solution steps
Starting from the source point, let ve (source point) = 0, and find the earliest occurrence time ve() of the remaining vertices in topological order.
Starting from the sink point, let vl (sink point) = ve (sink point), and find the latest occurrence time vl () of the remaining vertices according to your topology order
Find the earliest starting time e() of all arcs based on ve() of each vertex
Find the latest start time l() of all arcs based on vl() of each vertex
Find the difference between all activities in the AOE network and find out all the activities with d()=0 that constitute the critical path
Notice
All activities on the critical path are critical and are the key factors that determine the entire project. The duration of the entire project can be shortened by accelerating key activities, but it cannot be shortened arbitrarily. If the shortening is too small, the critical activities will become non-critical activities.
The critical path is not unique. Improving the speed of key activities on only one critical path cannot shorten the duration of the entire activity. You must speed up key activities included on all critical paths.
Find
basic concept
Search: The process of finding data elements that meet certain conditions in a data set. There are two situations: success and failure.
Lookup table: A data collection used for lookup, consisting of data elements of the same type, which can be an array or linked list and other data types.
Static lookup table: no need to modify the lookup table dynamically
Dynamic lookup table: a lookup table that requires dynamic insertion or deletion
Keyword: The value of a data item in a data element that uniquely identifies the data element. Based on keyword search, the search result is unique.
Average search length: the average number of comparisons of keywords in all search processes (an important indicator to measure the efficiency of the search algorithm)
static search
sequential search
Also called linear search, suitable for linear lists and linked lists
General linear table
Basic idea: Starting from one end of the linear list, check whether the keywords meet the given conditions one by one.
"Sentinels" can be introduced into the algorithm to avoid unnecessary judgment statements
For a table with n elements, if a given value key is equal to the i-th element, n-i 1 key comparisons are performed.
Find the average length
success
The probabilities are not equal:
The probability is equal:
fail
Normally, the search probabilities are not equal. If the search probability of each record can be known in advance, the search probabilities can be re-sorted.
Advantages and Disadvantages
Advantages: There is no requirement for the storage of data elements, and sequential chaining is acceptable.
Disadvantages: When n is large, the average search length is large
There is no requirement for the order of records in the table. Linear linked lists can only perform sequential searches.
ordered linear table
When the table is known before searching, the keywords are in order. When the search fails, the failed information can be searched later without comparing to the other end of the table, which can reduce the average search length.
Basic idea: When the keyword key<i-th keyword,>i-th keyword, then the search failure can be returned
Find the average length
success
The probabilities are not equal:
The probability is equal:
fail
half search
Applies only to ordered sequence lists
Basic idea: Compare the given value key with the element in the middle of the table. If they are equal, the search is successful; otherwise, the search element can only be found in the middle position of the first half or the second half other than the middle element.
decision tree
The process of binary search can be described by a binary tree
Each circular node in the tree represents a record, and the value in the node represents the key value of the record. The square node at the bottom of the tree represents an unsuccessful search.
Find the average length
success
fail
Will the successful 1/n, the square node (unsuccessful) case 1/n 1
time complexity
On average, non-sequential searches are more efficient.
This method needs to conveniently locate the search area, requires that the linear table must be randomly accessible, and is only suitable for ordered sequential tables.
Block search
Also known as index sequential search, it absorbs the advantages of sequential search and binary search. It has a dynamic structure and is suitable for fast search.
Basic idea: Divide the lookup table into several blocks. The elements within the blocks can be unordered, but the blocks are ordered. The largest keyword of the previous block < the lowest keyword of the next block. Create an index table again. Each element in the index table contains the maximum keyword of each block and the address of the first element in each block, arranged in order according to the keywords.
step
Determine the block where the record to be searched is located in the index table. You can search sequentially or in half.
Search sequentially within a block
Find the average length
success
The index table uses sequential search:
The index table uses a binary search:
dynamic search
Binary sorting tree
definition
If the left subtree is not empty, the values of all nodes in the left subtree are less than the value of the root node.
If the right subtree is not empty, the values of all nodes in the right subtree are greater than the value of the root node.
The left and right subtrees are also a binary sorting tree.
Find
Starting from the root node, the process of comparing downwards layer by layer along a certain branch
If the tree is not empty, compare it with the root node first, and the equality search is successful; if it is not equal, compare it with the left subtree if it is less than it, and compare it with the right subtree if it is greater than it.
insert
The original binary tree is empty, and nodes are inserted directly. A node smaller than the root is inserted into the left subtree, and a node larger than the root is inserted into the right subtree. The newly inserted leaf node is the right child or child of the last node visited on the search path when the search fails. left child
Construction: Starting from an empty tree, input elements one at a time and insert them into appropriate positions in the binary sorted tree.
delete
When deleting, you cannot delete all the nodes in the subtree where the node is the root. You must first remove the deleted node from the linked list that stores the binary sorted tree, and connect the binary linked lists that are disconnected due to the deletion of the node. up to ensure that the properties of the binary sorted tree are not lost
Condition
Leaf nodes, delete them directly
The z node has only one left subtree or right subtree. The subtree of z is called the subtree of the parent node of z.
Z has two subtrees on the left and right. Let the direct successor of z replace z. Then delete the direct successor from the binary sorting tree and convert to the first two cases.
efficiency analysis
Mainly depends on the height of the tree
The absolute value of the height difference between the left and right subtrees does not exceed 1, and the average search length is
Only left or right subtree, average search length
In the worst case, if the input sequence of the binary sorting tree is ordered, a single tree will be formed, and the performance will be significantly worse.
You can modify the pointer to complete the insertion and deletion operations without moving the node. The average execution time is
The binary search object is an ordered sequence list, deleting and inserting nodes, the cost
The ordered table is a static table, using a sequential table, and the search uses binary search. The ordered table is a dynamic table, using a binary sort tree.
binary balanced tree
definition
The absolute value of the height difference between the left and right subtrees of any node does not exceed 1
The height difference between the left and right subtrees is called the balance factor of the node (1,0,-1)
insert
Basic idea: First check whether the nodes on the insertion path are unbalanced due to this operation. If it is unbalanced and the absolute value of the balance factor closest to the insertion node on the insertion path is greater than 1, adjusting the node position refers to achieving rebalancing.
Adjustment method
LL balanced rotation: insert a new node into the left subtree of node A's left child B. That is, the left child B replaces node A as the root node.
RR balanced rotation: insert a new node into the right subtree of the right child C of node A. That is, the right child C replaces node A as the root node.
LR balanced rotation: insert a new node into the right subtree D of the left child B of node A. That is, the root of the right subtree D replaces B as the left child of A, and D replaces A as the root node.
RL balanced rotation: Insert a new node on the left subtree E of node A's right child C. That is, the root of the left subtree E replaces B as the right child of A, and E replaces A as the root node.
delete
Starting from node w, trace upward to find the first unbalanced node z
Then balance adjustments are made to the subtree rooted at z, where the conditions of x, y and z
y is the left child of z, x is the left child of y (LL, right single rotation)
y is the left child of z, and x is the right child of y (LR, double rotation first left and then right)
y is the right child of z, x is the right child of y (RR, left single rotation)
y is the right child of z, and x is the left child of y (RL, right and then left double rotation)
Different from the adjustment of the insertion operation, the subtree with z as the root must be balanced first, and then its ancestor nodes can be adjusted, even back to the root node.
Find
As with binary sorting trees, during the search process, the number of keys compared with a given value does not exceed the depth of the tree.
average search length
B-tree
Definition: Also called a multi-way balanced search tree, the maximum number of children of all nodes is called the order of the B-tree.
nature
The keyword of the node ([m/2]-1, m-1) m is the order
Subtree of node (m/2, m)
If the root node is not a non-terminal node, there are at least two subtrees
All leaf nodes are at the same level and carry no information
A multi-way balanced search tree with the balance factors of all nodes equal to 0
high
Excluding the layer where the last leaf node without any information is located
Find
Performing a search is very similar to a binary search tree
After finding a node on the B-tree, first search it in the ordered list. If the search is successful, it will be returned. If it is not successful, it will be searched in the corresponding subtree.
insert
Positioning, the position must be a non-leaf node in the lowest layer
During insertion, the number of keywords in each non-failed node is in the range ([m/2]-1, m-1). If the number of keywords in a node after insertion is less than m, it can be inserted directly; after insertion, the number of keywords in the inserted node is checked. When the number of keywords in the node after insertion is greater than m-1, the node must be split
Split method: Take a new node, insert the key into the original node, and divide the keywords into two parts from the middle position [m/2]. The keywords contained in the left part are placed in the original node, and the keywords contained in the right part are placed in the original node. Partially included keywords are put into new nodes, and the node at the middle position [m/2] is inserted into the parent node of the original node.
delete
Since the number of keywords after deletion may be less than [m/2]-1, the nodes need to be merged.
Two situations
If there are enough brothers, and the number of keywords of the right or left sibling node adjacent to this node is greater than m/2, it is necessary to adjust the right or left sibling node of the node and its parent nodes to achieve a new balance.
If there are not enough brothers to borrow, delete the keyword and merge it with the keywords in the right or left sibling node and parent node.
During the merging process, the number of keywords in the parent node will be reduced by 1. If its parent node is the root node and the number of keywords is reduced to 0, the root node will be deleted directly, and the new node after the merger will Become the root; if the parent node is not the root node and the number of keywords is reduced to [m/2]-2, it will need to be adjusted or merged with their own sibling nodes.
B-tree
nature
Node keyword (m/2, m)
Node subtree (m/2, m)
The number of subtrees of a node is equal to the number of keywords
All leaf nodes contain all keys and pointers to corresponding records
All branch nodes only contain the maximum value of the keywords in each of its child nodes and pointers to its child nodes.
Differences from B-tree
B tree has n keyword nodes and n subtrees. There are n keywords n and 1 subtree in B tree.
The number of keywords in B tree is n (m/2, m), in B tree ([m/2]-1, m-1)
In the B-tree, leaf nodes contain information, and non-leaf nodes only serve as indexes. Each index item in a non-leaf node only contains the maximum key of the corresponding subtree and a pointer to the subtree, and does not contain the key. The storage address of the word corresponding record
In B-tree, leaf nodes contain all keywords, that is, keywords that appear in non-leaf nodes will also appear in leaf nodes. The keywords contained in leaf nodes in B-tree and the keywords contained in other nodes are non-repeating
Each search in the B-tree is a path from the root node to the leaf node, regardless of success or failure.
Hash table
basic concept
Hash function: a function that maps a keyword in a lookup table to the address corresponding to the keyword
conflict
Mapping two or more different keywords to the same address, the different keywords that collide are called synonyms
A well-designed hash function minimizes such collisions
Such conflicts are inevitable, and methods for handling them must be designed.
Hash table: A data structure that is directly accessed based on keywords, establishing a direct mapping relationship between keywords and storage addresses.
Construction method
Notice
The domain of the hash function must include all the keys that need to be stored, and the range of the value domain depends on the size of the hash table or the address range.
The addresses calculated by the hash function should be evenly distributed in the entire address space with equal probability, thereby reducing the occurrence of conflicts.
The hash function should be as simple as possible and can calculate the hash address corresponding to any keyword in a short time.
direct addressing
This method is simple to calculate and does not cause conflicts, and is suitable for situations where the distribution of keywords is basically continuous. If the distribution of keywords is discontinuous and there are many empty spaces, storage space will be wasted.
Divide and leave remainder
The length of the hash table is m. Take a prime number p that is not greater than m but is closest to or equal to m.
The key to this method is to choose p so that each keyword is mapped to any address in the hash space with equal probability after being converted by this function, thereby reducing the possibility of conflicts as much as possible.
digital analysis
Select a number of bits in the keyword that are more evenly analyzed as the hash address
This method is suitable for a consistent set of keywords. If the keywords are changed, a new hash function must be reconstructed.
Find the center of the square
Take the middle digits of the square value of the keyword as the hash address. The specific number of digits depends on the actual situation.
The hash address obtained by this method is related to each bit of the keyword, making the distribution of the hash address even. It is suitable for situations where the values of each bit of the keyword are not even enough or are smaller than the number of digits required for the hash address.
How to deal with conflicts
open addressing method
It means that the free address that can store the new entry is open to its synonym entry and its non-synonym entry. The recursive formula
Four ways to obtain di increment sequence
Linear detection method
When a conflict occurs, the next unit in the table is viewed sequentially until a free unit is found or the entire table is searched.
This method may cause a large number of elements to accumulate at adjacent hash addresses, thereby reducing search efficiency.
square detection method
This method can better resolve conflicts and avoid pile-up situations.
Disadvantages: Not all cells on the hash table can be detected, but at least half of the cells can be detected
double hashing
This method requires the use of two hash functions. When the address obtained by the first hash function H(key) conflicts, the second hash function Hash2(key) is used to calculate the address increment of the keyword.
In the rehashing method, after at most m-1 detections, all positions in the table will be traversed and returned to the H0 position.
pseudorandom sequence
When di=pseudo-random number sequence, it is called pseudo-random sequence method
You cannot delete existing elements in the table at will. If you delete an element, the search address of other elements with the same hash address will be truncated. When you want to delete an element, give it a deletion mark and perform logical deletion; defect: after multiple deletions, there are many positions that appear to be full but are actually vacant. Therefore, regular maintenance is required, and the elements with deletion marks must be physically deleted.
Zipper method (connection method)
Store all synonyms in a linear linked list uniquely identified by their hash address
Search, insert and delete operations are mainly performed in the same word chain
This method is suitable for situations where insertion and deletion are frequent
Find and performance analysis
Find steps
initialization:
Check whether the record Addr exists in the table. If it does not exist, return the search failure; if there is a record, compare it with the value of the key, and if they are equal, return the search success, otherwise proceed to the second step.
Calculate the "next hash address" using the given conflict handling method, set Addr to this address, and go to the previous step
performance
The hash table establishes a direct mirror between the key and the storage location of the record, but due to the occurrence of "conflicts", the search process of the hash table is still a process of comparing a given value with the key. Therefore, it is still necessary to use the average search length as a measure of hash table search efficiency.
depending on
loading factor
Defined as the fullness of a table
The average search length depends on the filling factor of the hash table and does not depend on n,m
The larger the filling factor, the more "full" the filled records are, and the greater the possibility of conflict.
hash function
handling conflicts
sort
basic concept
Sorting: The process of rearranging the elements in a table so that the elements in the table are ordered by keywords
Input: n records R1, R2...Rn, corresponding keywords k1, k2....kn
Output: a rearrangement of the input sequence R1', R2',...Rn', such that k1'<k2'<....<kn'
Algorithm stability
If there are two original equal numbers in the table to be sorted, the order of the two keywords will remain unchanged after the sorting is completed.
The stability of an algorithm does not measure the stability of an algorithm. It mainly describes the nature of the algorithm.
Classification
Internal sorting
During sorting, all elements are stored in memory
Comparison: Determine the context of elements by comparing the sizes of two keywords
Move: Order has been achieved by moving elements
Performance depends on the time complexity (compare and move) and space complexity of the algorithm
external sort
During sorting, all elements cannot be stored in memory at the same time, and elements must be moved between main memory and external memory.
insertion sort
It is a simple and intuitive sorting method. Each time a record to be sorted is inserted into the previously arranged subsequence according to its key size, until all records are inserted.
direct insertion sort
process
Find the insertion position k of L(i) in L[1...i-1]
Move all elements in L[k...i-1] one position later
Copy L(i) to L(k)
Performance analysis
Space complexity O(1)
time complexity
Best: The elements are already in order, and insertion only requires one comparison without moving elements O(n)
Average: The overall number of appropriate moves
Worst: Arranged in exactly reverse order, the number of comparison moves is
The overall time complexity is
Stability: Compare and then move from back to front, and the relative position of the elements will not change. Stable sorting method
Applicability: sequential storage and chain storage
half insertion sort
Separate comparison and operation, first find the position to be inserted by folding in half, and then move all elements after the position to be inserted uniformly.
Performance analysis
time complexity
The number of comparisons has nothing to do with the initial state to be sorted. It is related to the number of elements in the table. The number of comparisons of elements is reduced to
The number of moves has not changed and depends on the initial state of the list to be sorted.
The overall time complexity is
Applicability: Sequential storage of linear tables with a small amount of data
Stability: It is a stable sorting method
Hill sort
Basic idea: Divide the list to be sorted into several special word lists, in the shape of L[i,i d,i 2d,...,i ik], and form a word list with records separated by a certain "increment". Direct insertion sorting is performed on each word table. When the elements in the table are basically in order, direct insertion sorting is performed on the entire record.
Process: Take a step size d1 smaller than n, divide all records in the table into d1 groups, put all records whose distance is a multiple of d1 into the same group, perform direct insertion sorting within each group, and then take the second step size d2 <d1, then repeat the above process until di=1. At this time, all records are in the same group, and then directly insert and sort.
Performance analysis
Space complexity O(1)
time complexity
Functions that depend on increment sequences
The time complexity of n in a certain range is
Worst case time complexity
Stability: When records with the same keyword are divided into different word lists, their relative order may change, which is an unstable sorting algorithm.
Only applicable when the linear table is stored sequentially
swap sort
Swap: Swap the positions of the two records in the sequence based on the comparison results of the keywords of the two elements in the sequence.
Bubble Sort
Basic idea: compare the values of adjacent elements from back to front (or from front to back). If it is in reverse order, swap positions until the sequence is compared. As a result, the smallest or largest element will be placed in the final position, up to n- One bubble can sort all elements
Performance analysis
Space complexity O(1)
time complexity
Number of comparisons
Number of moves
average time complexity
Stability: Since i>j and A[i]=A[j], no exchange will occur, which is a stable sorting method.
The generated ordered subsequence is globally ordered, and each sorting operation will place an element at the final position.
Quick sort
Basic idea: Take any element pivot in the list to be sorted as the pivot, divide the sorted list into two independent parts through one sort, and place the pivot in its final position, and the elements to the left of it are smaller than it. , the elements on the right side are all greater than it, it is a quick sort, repeat this process until each sub-table has only one element
Performance analysis
space efficiency
Quick sort is recursive and requires a recursive work stack to save the necessary information for each level of recursive calls.
most
Worst O(n)
average
time efficiency
worst
most
It is the algorithm with the best average performance among all internal sorting algorithms.
Stability: If there are two keywords in the right end range that are the same and smaller than the benchmark value, their relative positions will change when swapped to the left end range, which is an unstable sorting method.
During the sorting process, no ordered subsequence is generated, but the reference element is placed at its final position after each sorting pass.
selection sort
Each pass selects the element with the smallest keyword among n-i 1 elements to be sorted, the i-th element of the optimal ordered subsequence, until the n-1th pass is completed, and there is only one element left to be sorted.
Simple selection sort
Basic idea: The sorting table is L[1...n]. The i-th sorting selects the element with the smallest keyword from L[i...n] and exchanges it with L(i). Each sorting can determine a The final position of the element, after n-1 times, the entire sorting list can be ordered.
Performance analysis
Space efficiency O(1)
time complexity
Stability: It is an unstable sorting method
Heap sort
Heap definition: satisfies L(i)>=L(2i) and L(i)>=L(2i 1) -- large root heap or L(i)<=L(2i) and L(i)<=L( 2i 1)--Small root heap
Sorting idea: The n elements stored in L[1...n] are referred to as the initial heap. Taking the large root heap as an example, the top element of the heap is the maximum value. After the top element of the heap is output, the bottom element of the heap is sent to the top of the heap. , at this time the root node does not meet the properties of a large root heap, the top element of the heap is adjusted downward, and then the top element of the heap is output until there is only one element left in the heap.
Key: Construct the initial heap. A complete binary tree with n nodes, the last node is the child of the [n/2]th node. Filter the subtree with the [n/2]th node as the root, so that the subtree becomes a heap, and then filter forward accordingly.
Performance analysis
Space complexity O(1)
time complexity
Stability: It is an unstable sorting method
Suitable for situations with many keywords
merge sort
Basic idea: combine two or more ordered lists into a new ordered list. Assuming that the list to be sorted contains n records, it can be regarded as n ordered word lists, with the length of each sublist is 1, and then merge them two by two until they are merged into an ordered list of length n.
Performance analysis
Space complexity O(n)
time complexity
Stability: It is a stable sorting method
When performing k-way merge sorting on N elements, the number of sorting passes m satisfies
Radix sort
It is a method that does not sort based on comparison and movement, but sorts the size of each keyword.
Implementation
Highest bit first: Divide the keywords into several smaller subsequences layer by layer by decreasing the weight, and finally connect them.
Lowest bit first: Sort in order of increasing keyword weight, and finally form an ordered queue.
Performance analysis
Space complexity O(r) r: number of queues used
Time complexity: d passes of allocation and collection, one pass of allocation O(n), one pass of collection O(r),
Stability: It is a stable sorting algorithm
Performance analysis and applications
Algorithm selection
If n is small, insert it directly or simply select it. The record itself has a large amount of information, so simple selection is better.
The initial state keywords are basically ordered: direct insertion or bubbling
Larger n: quick sort (shortest average time), heap sort (less auxiliary space) and merge sort (stable algorithm)
When the n keywords of a file are randomly distributed, with the help of the "comparison" sorting algorithm, at least
n is very large, the number of recorded keywords is small and can be decomposed, and radix sorting is used.
The record itself contains a large amount of information. To avoid spending a lot of time moving records, a linked list can be used as a storage structure.