Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

data structure

An introduction to data structure knowledge, including the basic terminology of data structure, three elements of data structure, and three parts of algorithm. You can read it if you need it.

Edited at 2022-09-07 11:22:35

PlotWizard

Recent works View more works>>

data structure

PlotWizard

Recent works View more works>>

Recommended to you
Outline

introduction

basic terminology

data

A carrier of information, a collection of information stored in a computer

data element

basic unit of data

Composed of data items, the data item is its smallest indivisible unit.

data object

A collection of data elements with the same properties, a subset of data

type of data

A collection of values and a set of operations defined on this collection.

Atomic type: its value cannot be subdivided

Structural type: a data type that can be further decomposed into several components

Abstract data types: abstract data organization and operations related to it

data structure

A collection of data elements that have one or more specific relationships with each other

three types

logical structure

storage structure

Data operations

Three elements of data structure

logical structure

storage structure

sequential storage

logically and physically adjacent

Advantages: Implement random access

Disadvantage: Only an adjacent block of storage units can be used

chain storage

Logically adjacent, not necessarily physically adjacent. Use pointers to represent logical relationships.

Advantages: No fragmentation, full use of storage units

Disadvantages: Storing pointers takes up additional storage space and can only be accessed sequentially

Index storage

Stores element information and additional index tables.

Advantages: fast retrieval speed

Disadvantages: The additional index table takes up additional storage space, and it takes more time to modify the index table when inserting or deleting elements.

Hash storage

The storage address of the element is directly calculated based on the element's keyword, also known as hash storage.

Advantages: Data retrieval, addition and deletion are faster

Disadvantages: Conflicts in element storage units may occur, increasing time and space overhead.

Data operations

Operations applied to data include the implementation and definition of the operation

The definition is for logical structure and operation function.

Implement specific operation steps for storage structures and calculations

algorithm

A description of the steps to solve a specific problem, a finite sequence of instructions, each instruction representing one or more operations

characteristic

Finiteness

certainty

feasibility

enter

output

Achieve goals

correctness

readability

Robustness

Efficiency and low storage requirements

efficiency

time complexity

The number of times the statement is executed repeatedly

space complexity

Storage space consumed by the algorithm

linear table

basic definition

A finite sequence of n data elements of the same data type

Basic features

Each element except the first element has one and only one direct predecessor

Each element except the last element has exactly one direct successor

A linear table is a logical structure

sequential storage

Also called a sequence table, it is a storage structure in which data are logically and physically adjacent.

define structure

static structure

dynamic structure

insert operation

Element movement

Best: O(1)

Worst: O(n)

Average number of node moves: n/2

Insertion time complexity is O(n)

Delete operation

Element movement

Best: O(1)

Worst: O(n)

Average number of node moves: (n-1)/2

The time complexity of deletion is O(n)

Find by value

Element movement

Best: O(1)

Worst: O(n)

Average number of node moves: (n 1)/2

The time complexity of searching by value is O(n)

chain storage

The data is connected together through pointers, and sequential access cannot be performed. The pointer can be modified when inserting and deleting. Also called a linked list, it is a storage structure

Single list

Store data elements in a linear table through an arbitrary set of storage units. Each linked list node also needs to store a pointer to the next node.

Advantages: Solve the fragmentation problem caused by continuous data storage

Disadvantages: There is a waste of space in the storage pointer field. Discrete data storage is a non-random access storage structure. To find a specific node, you need to start from the table header.

Head node: The node attached before the first node, the data field does not store information. (1) The operation on the first node is consistent with other nodes. (2) The operations on empty tables and non-empty tables are unified

Head pointer: The pointer pointing to the first node. When it is NULL, the linear list is an empty list. Regardless of whether there is a head node, the head pointer points to the first node of the linked list.

Operational realization

Head insertion method to create singly linked list

Insert new node into linked list header

Time complexity O(n)

Create singly linked list using tail insertion method

When a new node is inserted into the end of the linked list, a new tail pointer needs to be added to always point to the tail node.

Time complexity O(n)

Find node value by serial number

Starting from the first node, search along the next domain until the i-th node is found.

Time complexity O(n)

Find table nodes by value

Starting from the first node, search along the data field until the required value

Time complexity O(n)

insert node

Insert the value of the corresponding data into the corresponding i-th node position.

Time complexity O(n)

delete node

Delete the i-th node at the corresponding position

Time complexity O(n)

Ask for table length

Count the number of data nodes in a singly linked list.

Time complexity O(n)

Double linked list

A linked list containing two pointer fields. Prior executes the predecessor node and next points to the successor node.

Operational realization

The rest of the operations are similar except insertion and deletion.

insert

Time complexity 0(1)

delete

Time complexity 0(1)

circular linked list

Circular singly linked list

The pointer to the last node in the table executes the data field with the node

Insertion and deletion are similar to singly linked lists, but there is no need to judge whether it is the end of the list.

Set the tail pointer so that the time complexity of operating the head and tail of the table is O(1)

Circular doubly linked list

The priority pointer of the head node needs to point to the end node of the table. When the table is empty, the priority and next of the head node are both equal to L.

static linked list

An array is used to describe the linked storage structure of a linear list, but the pointer field of the node is the subscript of the index array.

Use requires pre-allocation of a contiguous storage space

The end flag is next==1

Comparison between sequence list and linked list

Reading and writing methods

Sequence table

Sequential access, random access

linked list

Elements can only be accessed sequentially at the head of the table

Logical structure and physical structure

Sequence table

Logically adjacent, physically adjacent

linked list

Logically adjacent, not necessarily physically adjacent

Find, insert and delete

Sequence table

Find by value

Unordered O(n)

OrderedO(log2n)

Search by serial number

O(1)

insert delete

Move half the table length elements on average

linked list

Search by serial number

O(n)

insert delete

Modify the corresponding node pointer field

space allocation

Sequence table

static

It cannot be expanded once it is full, and larger space needs to be pre-allocated. Too large wastes space; too small causes overflow.

dynamic

It can be expanded, but the operation efficiency is low. If there is no larger continuous storage space, the allocation will fail.

linked list

Convenient storage space allocation, flexible and efficient operation

stack,queue

stack

definition

A linear table that only allows insertion or deletion in a section, first in, last out

Top of stack: a section for insertion and deletion

Bottom of the stack: the other end that is fixed and does not allow insertion and deletion

n different elements are pushed onto the stack, and the number of different arrangements of the elements that appear is

sequential storage

A sequentially stored stack uses a set of storage units with consecutive addresses to store data elements from the bottom of the stack to the top of the stack, and sets a pointer to indicate the current position of the top element of the stack.

The stack top pointer S.top is initially set to S.top=-1

Top element of stack: S.data[S.top]

Push to the stack: When it is not satisfied, the top pointer of the stack is 1, and then the value is sent to the top pointer of the stack.

Outbound: When it is not empty, first take the value of the top element of the stack, and then -1 the top pointer of the stack.

Stack empty judgment condition: S.top==-1

Stack full condition: S.top==MaxSize-1

Stack length: S.top 1

shared stack

Two sequential stacks share a one-dimensional array space. The bottoms of the stacks are set at both ends of the shared space and extend toward the middle.

Storage space can be used more effectively, and the time complexity of accessing data is O(1).

chain storage

A stack using chained storage is called a chain stack.

Advantages: It is convenient for multiple stacks to share storage space and improve efficiency, and there is no stack overflow situation.

The chain stack has no head node, and Lhead points to the top element of the stack.

It facilitates the insertion and deletion of nodes. The operation is similar to that of a linked list. Pushing and exiting are performed at the header of the table.

queue

It is also a linear table with limited operations, which only allows insertion at one end and deletion at the other end.

Operating characteristics: first in, first out

Basic Information

Head of line: a section that is allowed to be deleted

Tail of the queue: a section that allows insertion

Empty queue: an empty list without any elements

sequential storage

Allocate a continuous address unit to store the data in the queue, with a head pointer front and a tail pointer rear.

Team empty (initial state): Q.front==Q.rear==0

Enter the queue: When dissatisfied, send the value to the end of the queue, and the end of the queue pointer is 1

Dequeue: When it is not empty, take the head element of the team and the head pointer 1

There will be false overflows.

circular queue

Think of the queue as a ring.

Initial: Q.front=Q.rear=0

The front pointer of the team advances by 1: Q.front=(Q.front 1)%MaxSize

The queue tail pointer advances by 1: Q.rear=(Q.rear 1)%MaxSize

Queue length: (Q.rear MaxSize-Q.front)%MaxSize

When leaving the queue or entering the queue, the pointer advances by 1 clockwise.

Judging that the team is full and empty

Sacrifice a storage unit

When the queue head pointer is at the next position of the queue tail pointer, it is used as a sign that the queue is full.

The team is full: (Q.rear 1)%MaxSize==Q.front

Team empty: Q.front==Q.rear

Number of elements: (Q.rear-Q.front MaxSize)%MaxSize

Add element count data member

Team empty: Q.size==0

Team is full: Q.size==MaxSize

Add tag data element

tag=0, deletion causes Q.front==Q.rear, the queue is empty

tag=1, insertion causes Q.front==Q.rear, the queue is full

chain storage

The chain representation of the queue is a chain queue, a singly linked list with a head pointer and a tail pointer.

Q.front==Q.rear==NULL, the queue is empty

It is often set as a singly linked list with the head node, which can simplify the insertion and deletion operations of the queue.

Using a chain structure to store queues avoids the problems of unreasonable storage allocation and "overflow"

deque

A queue that can perform enqueue and dequeue operations on both ends of the queue.

Enqueuing operation: The elements enqueued on the front end are arranged in front of the elements enqueued on the back end.

Output restricted deque: one end allows insertions and deletions, the other only allows insertions

Input restricted deque: one end allows insertions and deletions, the other only allows deletions

application

stack application

bracket matching

(1) Set an empty stack and read parentheses sequentially

(2) Left parenthesis: pushed onto the stack as a new, more urgent expectation

(3) Right bracket: eliminates the most urgent expectation, or is illegal

Expression evaluation

Scan the expression, push the operands into the stack, scan to the operator op, exit the two operands Y and X from the stack, calculate X opY, push the calculation result into the stack, and continue scanning until the scan is completed.

Fibonacci Sequence

queue application

Level traversal

(1) Root node joins the queue

(2) If the team is empty, end the traversal, otherwise repeat (3)

(3) Enter the first node into the queue and visit it. If there is a left child, the left child joins the team; if there is a right child, the right child joins the team. return(2)

computer system

Use the queue as a data buffer to solve the problem of data mismatch between the host and external devices

Use queues to store various request information to solve resource competition problems caused by multiple users

array

definition

An array is a finite sequence of n identical data types.

Arrays are a generalization of linear tables. Once an array is defined, its dimensions and bounds will not change.

storage structure

row-major storage

column-major storage

special matrix

Symmetric matrix

triangular matrix

Lower triangle data storage

Upper triangle data storage

tridiagonal matrix

sparse matrix

Store the non-zero elements in the matrix in the order of their rows, columns and corresponding values, triples

Trees and Binary Trees

Tree

definition

A finite set of n nodes. When n=0, it is an empty tree.

There is only one specific root node

When n>1, the remaining nodes can be divided into m disjoint finite sets. Each set itself is a tree and a subtree of the root.

Features

The root node has no predecessor, and all nodes except the root node have one and only one predecessor.

All nodes can have zero or more successors

the term

ancestors, descendants, parents, children, brothers

Spend

The number of children of the node

The maximum degree of a node is the degree of the tree

Branch node: node with degree>0

Leaf node: node with degree=0

Node depth: starting from the root node and accumulating layer by layer from top to top.

Node height: starting from leaf nodes and accumulating layer by layer from bottom to top.

Ordered tree: the left and right order of nodes cannot be interchanged

Unordered tree: the left and right order of nodes can be interchanged

Path: consists of a sequence of nodes passing between two nodes. The number of nodes is the length of the path.

nature

Number of nodes = degree and 1

storage structure

Parents expressed

A set of continuous spaces is used to store each node, and a pseudo pointer is added to each node to represent the position of the node's parents in the array.

You can quickly find the parent nodes of each node, but you need to traverse the entire structure to find the children of the node.

child expresses

The children of each node are connected into a linear structure using a singly linked list.

Finding children is very simple, but the operation of finding parents requires traversing the n child linked lists pointed to by the child node pointer fields in the n linked lists.

Brothers and children express

Also called binary tree representation. Each node consists of three parts: the node value, the node's pointer to the first child node, and the pointer to the next sibling node.

It is more complicated to find the parent node in this structure. You can execute its parent node by adding a parent field.

Traverse

Tree

first root

When accessing the root node, traversing the subtree is also in the order of root first and then node.

posterior root

First traverse the subtree and then traverse the root. The smaller subtrees are also processed in this way.

forest

precedence

The root node of the first tree

Pre-order traverse the first tree root node subtree

Preorder traverse the forest consisting of subtrees except the first subtree

mid-order

In-order traverse the root node of the first subtree subtree forest

The first tree root node

Inorder traverse the forest consisting of the remaining subtrees except the first subtree

Tree, forest and binary tree traversal correspondence

Binary tree

definition

Each node has at most two subtrees, and the subtrees can be divided into left and right subtrees, and their order cannot be reversed. Unlike a tree of degree 2

special binary tree

full binary tree

complete binary tree

The last level of the unsatisfied binary tree has only one node with degree 1 and only a left child and no right child.

Binary sorting tree

The keys of all nodes on the left subtree are less than the keys of the root node, and the keys of all the nodes on the right subtree are greater than

balanced binary tree

The depth difference between the left subtree and the right subtree of any node in the tree does not exceed 1

nature

complete binary tree

storage structure

order

Use a set of continuous storage units to store the nodes on the binary tree from top to bottom and from left to right at a time, using 0 to represent non-existent empty nodes.

chain

The binary linked list contains three fields: data field, left pointer field lchild, and right pointer field rchild.

In a binary linked list of n nodes, there are n 1 empty link fields.

Traverse

precedence

Root, left subtree, right subtree

mid-order

left subtree, root, right subtree

Afterword

left subtree, right subtree, root

level

Visit each node in order from top to bottom, left to right

clue binary tree

concept

It is to arrange the nodes in the binary tree into a linear sequence according to certain rules.

In the form of a linked list, when the flag field is 0, it indicates the left child, and when the flag field is 1, it indicates the right child.

Clue binary tree construction

Threading is to change the null pointer in the linked list to a clue for executing the predecessor or successor. The essence is to traverse the binary tree once

mid-order

First order, last order

tree forest binary tree

Convert tree to binary tree

Rule: Left child, right brother

Art style

Add a line between sibling nodes

Each node only retains the connection to the first child

Take the root of the tree as the axis and rotate it 45 degrees clockwise

Convert forest to binary tree

Convert each number into the corresponding binary tree

The roots of each tree are managed as brothers, and a connection line is added between the tree roots.

Rotate 45 degrees clockwise around the root of the first tree

Huffman tree

The nodes of the tree are assigned a numerical value that represents a certain meaning, which is called the weight of the node.

Weighted path length: the product of the path length from the root of the tree to any node and the weight of the node

Among the binary trees containing n weighted leaf nodes, the one with the smallest weighted path length is called a Huffman tree.

Construction method

Among the node values, select the two minimum values as the left and right subtrees of the new node, and the sum of their weights is used as the value of the root node.

Choose the two smallest among the new weights as the next left and right children.

Features

Each initial node eventually becomes a leaf node. The smaller the weight, the greater the path length from the node to the root node.

A total of n-1 new nodes are created, and the total number of nodes is 2n-1.

Each construction selects 2 trees as the children of the new node, and there is no node with degree 1.

coding

Each edge is encoded as 1 on the left and 0 on the right, or 1 on the right and 0 on the left.

encoding for prefix: no encoding is the prefix of another encoding

picture

definition

Composed of vertex set V and edge set E

It cannot be an empty graph. There cannot be a vertex in the graph, but it can have no edges.

basic concept

directed graph

An edge is an ordered pair of vertices, represented by <v,w>

Strong connectivity: There are opposite connected paths between two vertices

Strongly connected graph: every pair of vertices is strongly connected

Strongly connected components: maximal strongly connected subgraphs in directed graphs

Undirected graph

Edges are unordered pairs of vertices, represented by (v,w)

Connected: There is a path between two vertices

Connected graph: Any two fixed points are connected, otherwise it is a disconnected graph.

Connected components: maximal connected subgraphs in undirected graphs

Simple graph: There are no duplicate edges and no edges from vertices to itself.

Multigraph: The number of edges between two vertices in the graph is greater than 1, and the vertices are allowed to be associated with themselves through an edge.

complete graph

For an undirected graph, there is an edge between any two vertices.

For a directed graph, there are two arcs in opposite directions between any two vertices.

Sub-picture:

Spanning tree: It is a minimal connected subgraph that contains all the vertices in the graph.

Spanning forest: In a non-connected graph, it is composed of spanning trees of connected components.

Spend

The degree of a vertex in an undirected graph is related to the number of its edges.

directed graph

In-degree: vertex as end point

Out-degree: starting from the vertex

Degree = in-degree out-degree,

Edge weight and network: Each edge can be marked with a value with a certain meaning, called a weight. This graph with weights on the edges is called a weighted graph, also called a network.

A graph with a small number of edges is called a sparse graph, and vice versa is a dense graph. The difference between the two is vague and relative

Path: a sequence of vertices connecting two vertices

Path length: the number of vertices in the vertex sequence

Circuit: a path where the first vertex and the last vertex are the same

Simple loop: There are no repeated vertices in the path sequence. A loop in which all vertices except the last vertex and the first vertex are different is a simple loop.

Distance: the shortest path from vertex 1 to vertex 2. If there is no shortest path, the distance is recorded as infinity.

Directed tree: a directed graph in which the in-degree of one vertex is zero and the in-degree of the other vertices is 1.

storage

adjacency matrix

Use a one-dimensional array to store the vertex information in the graph, and use a two-dimensional array to store the edge information in the graph.

Picture without authority

weighted graph

Features

The adjacency matrix of an undirected graph is symmetrical, and compressed storage can be used for larger scales.

Space complexity O(n)

directed graph

The number of non-zero elements in the i-th row out of degree

The number of non-zero elements in the i-th column of in-degree

Suitable for representing dense graphs

adjacency list

Create a singly linked list for each vertex in the graph. The nodes in the singly linked list represent the edges attached to the vertex. This singly linked list is called the edge list of the vertex. The head pointer of the edge table and the data information of the vertices are stored in order

Features

storage

Undirected graph:

Directed graph:

Suitable for representing sparse graphs

directed graph

Out-degree: the number of data elements in a singly linked list

In-degree: traverse the entire adjacency list

Indicates that it is not unique. The connection order in the singly linked list corresponding to the node can be interchanged.

cross linked list

A chain storage structure of directed graph

arc node

tailvex: The position of the arc tail in the picture

headvex: The position of the arc head in the picture

hlink: points to the next arc with the same arc head

tlink: points to the next arc with the same arc tail

vertex node

data: Vertex-related data information

firstin: the first arc node whose vertex is the arc head

firstout: the first arc node whose vertex is the tail of the arc

Vertex nodes are stored sequentially

adjacency multiple list

Chain storage structure of undirected graph

edge node

mark: mark whether the edge has been searched

ivex jvex: the position of the two vertices attached to the edge in the graph

ilink: the next edge attached to the vertex ivex

jlink: the next edge attached to the vertex jvex

The same edge is represented by only one node and two vertices in the adjacency list.

Traverse

Breadth First Search (BFS)

Starting from a certain vertex, the access from near to far is connected to the vertex by a path and the path length is 1, 2,...n

It is a hierarchical search process that uses an auxiliary queue to store the next layer of vertices of the vertex being accessed.

Performance analysis

space complexity

time complexity

adjacency list

adjacency matrix

Find the shortest path

Find the shortest path between two vertices of a non-weighted graph,

breadth-first spanning tree

Breadth-first spanning tree of adjacency matrix is unique

The breadth-first spanning tree of the adjacency list is not unique

Depth First Search (DFS)

Visit a certain starting vertex v in the graph, then start from v and visit any unvisited vertex w adjacent to v, then visit any unvisited vertex q adjacent to w, and repeat the process.

Performance analysis

Space complexity:

time complexity

Adjacency matrix:

Adjacency list:

depth first spanning tree

The adjacency list stores non-unique

Traversal and connectivity

Undirected graph

Connected, starting from any node, you can visit all vertices only once

Non-connected, one traversal can only access all the vertices of the connected component where the vertex is located.

Directed graph: If there is a path from the initial point to every vertex in the graph, all vertices in the graph can be accessed, otherwise all vertices cannot be accessed.

Related applications

minimum spanning tree

For weighted connected undirected graphs, the spanning trees are different, and the weight of each tree may also be different. Let W be the set of all spanning trees in the graph, then among the sets, the tree with the smallest sum of weights is called the minimum spanning tree of the graph.

nature

Not unique, for an undirected connected graph edge 1 = vertex, the minimum spanning tree is itself

The sum of weights is unique

Edge 1=Vertex

Prim's algorithm

Initially, select any vertex to add to the tree, select the vertex closest to it to add to the set, and repeat

time complexity:

Suitable for graphs with dense edges

Kruskal algorithm

A method to construct a minimum tree by selecting appropriate edges in order of increasing weights

time complexity:

A graph with sparse edges and many vertices

shortest path

In a weighted graph, the sum of paths from one vertex to another vertex is defined as the weighted path length of the modified path.

The solution depends on: the shortest path between two points contains the shortest paths between other vertices on the path

Dijkstra's algorithm

Solve the shortest path from a single source, that is, the shortest path from a certain vertex to other vertices

Set up a set S, record the vertices of the shortest path that have been found, and put the source point v0 in the set S. Every time Vi is incorporated into the set, the source point must be modified into the set.

Auxiliary array

dist[]: record the shortest path length from source point v0 to other points

path[]: the predecessor node of the shortest path from the source point to vertex i

time complexity

Adjacency matrix and weighted adjacency list:

Not applicable to graphs with negative values on edges

Floyd algorithm

Find the shortest path between each pair of vertices

Implementation: It is actually an iterative process. An n-order square matrix sequence is generated through recursion. Each time, one more vertex is considered on the shortest path between vertices i and j. After n iterations, we can get The shortest path between i and j

time complexity

Edges with negative weights are allowed, but cycles containing edges with negative weights are not allowed.

Directed acyclic graph description expression

Directed acyclic graph: There is no cycle in a directed graph

Share the same subexpressions in expressions to save storage space

topological sort

AOV network: A DAG graph is used to represent a project, and its vertices represent activities. The directed edge between two vertices must be active before the end. This directed graph is called a network in which the vertices represent activities.

definition

A sequence of vertices of a directed acyclic graph

So that if there is a path from vertex B to vertex A, then vertex A appears behind vertex B in the reordering

fulfil requirements

Each vertex appears only once

If A is in front of B in the topological sequence, there is no edge from B to A in the graph.

sort by

Select a vertex without a predecessor and output

Delete the vertex and all directed edges starting from it

Repeat the first two steps until there are no vertices in the network. If there are always points, it means there is a cycle in the graph.

time complexity

Adjacency list:

Adjacency matrix:

reverse topological sort

Select vertices with no successors (out-degree 0) and output

Delete the vertex and all directed edges ending with it

Critical Path

AOE network

In a weighted directed graph, events are represented by vertices, activities are represented by directed edges, and the cost of completing the activity (the time required to complete the activity) is represented by the weight on the edge. This is called a network that uses edges to represent activities.

nature

Only after the time represented by a vertex occurs, the activity represented by the directed edge starting from the vertex can begin.

Only when the activities represented by the directed edges entering a vertex have ended, the event represented by the vertex can occur.

In this graph, there is only one vertex with an in-degree of 0, which is called the start vertex (source point) and represents the beginning of the entire project. There is only one vertex with an out-degree of 0, which is called the end vertex (sink) and represents the end of the entire project.

key activities

Among all paths from the source to the sink, the path with the largest path length is called the critical path, and the activities on the critical path are called critical activities.

The shortest time to complete the entire project is the length of the critical path, the time spent on key activities on the critical path.

Key activities affect the time of the entire project. If the key activities cannot be completed on time, the completion time of the entire project will be extended. As long as the key activities are found, the critical path can be found, and the shortest completion time can be obtained.

Parameter definition

The earliest occurrence time of event Vk ve(k)

The longest path length from the source point to the vertex Vk. The earliest occurrence time of Vk determines the earliest time that all activities starting from Vk can start.

The calculation of ve() is performed from front to back, and can be performed in the order of topological structure.

The latest occurrence time vl(k) of event Vk

Under the premise of not delaying the completion of the entire project, it is guaranteed that its subsequent event Vj can occur at its latest occurrence time vl(j), the latest time that the event must occur

vl(sinking point)=ve(sinking point)

When calculating this value, it is necessary to add a stack record topology sequence

The earliest start time e(i) of activity ai

The earliest occurrence time of the time represented by the starting point of the activity arc

The latest start time l(i) of activity ai

The difference between the latest occurrence time of the event represented by the end point of the activity arc and the time required for the activity

The difference between the latest start time and the earliest start time of the activity

The margin of activity completion time, that is, the overall completion time of the project remains unchanged, and the activity can be delayed by the time

If the time margin of an activity is zero, it means that the activity must be completed on time, and the activity is a critical activity

Solution steps

Starting from the source point, let ve (source point) = 0, and find the earliest occurrence time ve() of the remaining vertices in topological order.

Starting from the sink point, let vl (sink point) = ve (sink point), and find the latest occurrence time vl () of the remaining vertices according to your topology order

Find the earliest starting time e() of all arcs based on ve() of each vertex

Find the latest start time l() of all arcs based on vl() of each vertex

Find the difference between all activities in the AOE network and find out all the activities with d()=0 that constitute the critical path

Notice

All activities on the critical path are critical and are the key factors that determine the entire project. The duration of the entire project can be shortened by accelerating key activities, but it cannot be shortened arbitrarily. If the shortening is too small, the critical activities will become non-critical activities.

The critical path is not unique. Improving the speed of key activities on only one critical path cannot shorten the duration of the entire activity. You must speed up key activities included on all critical paths.

Find

basic concept

Search: The process of finding data elements that meet certain conditions in a data set. There are two situations: success and failure.

Lookup table: A data collection used for lookup, consisting of data elements of the same type, which can be an array or linked list and other data types.

Static lookup table: no need to modify the lookup table dynamically

Dynamic lookup table: a lookup table that requires dynamic insertion or deletion

Keyword: The value of a data item in a data element that uniquely identifies the data element. Based on keyword search, the search result is unique.

Average search length: the average number of comparisons of keywords in all search processes (an important indicator to measure the efficiency of the search algorithm)

static search

sequential search

Also called linear search, suitable for linear lists and linked lists

General linear table

Basic idea: Starting from one end of the linear list, check whether the keywords meet the given conditions one by one.

"Sentinels" can be introduced into the algorithm to avoid unnecessary judgment statements

For a table with n elements, if a given value key is equal to the i-th element, n-i 1 key comparisons are performed.

Find the average length

success

The probabilities are not equal:

The probability is equal:

fail

Normally, the search probabilities are not equal. If the search probability of each record can be known in advance, the search probabilities can be re-sorted.

Advantages and Disadvantages

Advantages: There is no requirement for the storage of data elements, and sequential chaining is acceptable.

Disadvantages: When n is large, the average search length is large

There is no requirement for the order of records in the table. Linear linked lists can only perform sequential searches.

ordered linear table

When the table is known before searching, the keywords are in order. When the search fails, the failed information can be searched later without comparing to the other end of the table, which can reduce the average search length.

Basic idea: When the keyword key<i-th keyword,>i-th keyword, then the search failure can be returned

Find the average length

success

The probabilities are not equal:

The probability is equal:

fail

half search

Applies only to ordered sequence lists

Basic idea: Compare the given value key with the element in the middle of the table. If they are equal, the search is successful; otherwise, the search element can only be found in the middle position of the first half or the second half other than the middle element.

decision tree

The process of binary search can be described by a binary tree

Each circular node in the tree represents a record, and the value in the node represents the key value of the record. The square node at the bottom of the tree represents an unsuccessful search.

Find the average length

success

fail

Will the successful 1/n, the square node (unsuccessful) case 1/n 1

time complexity

On average, non-sequential searches are more efficient.

This method needs to conveniently locate the search area, requires that the linear table must be randomly accessible, and is only suitable for ordered sequential tables.

Block search

Also known as index sequential search, it absorbs the advantages of sequential search and binary search. It has a dynamic structure and is suitable for fast search.

Basic idea: Divide the lookup table into several blocks. The elements within the blocks can be unordered, but the blocks are ordered. The largest keyword of the previous block < the lowest keyword of the next block. Create an index table again. Each element in the index table contains the maximum keyword of each block and the address of the first element in each block, arranged in order according to the keywords.

step

Determine the block where the record to be searched is located in the index table. You can search sequentially or in half.

Search sequentially within a block

Find the average length

success

The index table uses sequential search:

The index table uses a binary search:

dynamic search

Binary sorting tree

definition

If the left subtree is not empty, the values of all nodes in the left subtree are less than the value of the root node.

If the right subtree is not empty, the values of all nodes in the right subtree are greater than the value of the root node.

The left and right subtrees are also a binary sorting tree.

Find

Starting from the root node, the process of comparing downwards layer by layer along a certain branch

If the tree is not empty, compare it with the root node first, and the equality search is successful; if it is not equal, compare it with the left subtree if it is less than it, and compare it with the right subtree if it is greater than it.

insert

The original binary tree is empty, and nodes are inserted directly. A node smaller than the root is inserted into the left subtree, and a node larger than the root is inserted into the right subtree. The newly inserted leaf node is the right child or child of the last node visited on the search path when the search fails. left child

Construction: Starting from an empty tree, input elements one at a time and insert them into appropriate positions in the binary sorted tree.

delete

When deleting, you cannot delete all the nodes in the subtree where the node is the root. You must first remove the deleted node from the linked list that stores the binary sorted tree, and connect the binary linked lists that are disconnected due to the deletion of the node. up to ensure that the properties of the binary sorted tree are not lost

Condition

Leaf nodes, delete them directly

The z node has only one left subtree or right subtree. The subtree of z is called the subtree of the parent node of z.

Z has two subtrees on the left and right. Let the direct successor of z replace z. Then delete the direct successor from the binary sorting tree and convert to the first two cases.

efficiency analysis

Mainly depends on the height of the tree

The absolute value of the height difference between the left and right subtrees does not exceed 1, and the average search length is

Only left or right subtree, average search length

In the worst case, if the input sequence of the binary sorting tree is ordered, a single tree will be formed, and the performance will be significantly worse.

You can modify the pointer to complete the insertion and deletion operations without moving the node. The average execution time is

The binary search object is an ordered sequence list, deleting and inserting nodes, the cost

The ordered table is a static table, using a sequential table, and the search uses binary search. The ordered table is a dynamic table, using a binary sort tree.

binary balanced tree

definition

The absolute value of the height difference between the left and right subtrees of any node does not exceed 1

The height difference between the left and right subtrees is called the balance factor of the node (1,0,-1)

insert

Basic idea: First check whether the nodes on the insertion path are unbalanced due to this operation. If it is unbalanced and the absolute value of the balance factor closest to the insertion node on the insertion path is greater than 1, adjusting the node position refers to achieving rebalancing.

Adjustment method

LL balanced rotation: insert a new node into the left subtree of node A's left child B. That is, the left child B replaces node A as the root node.

RR balanced rotation: insert a new node into the right subtree of the right child C of node A. That is, the right child C replaces node A as the root node.

LR balanced rotation: insert a new node into the right subtree D of the left child B of node A. That is, the root of the right subtree D replaces B as the left child of A, and D replaces A as the root node.

RL balanced rotation: Insert a new node on the left subtree E of node A's right child C. That is, the root of the left subtree E replaces B as the right child of A, and E replaces A as the root node.

delete

Starting from node w, trace upward to find the first unbalanced node z

Then balance adjustments are made to the subtree rooted at z, where the conditions of x, y and z

y is the left child of z, x is the left child of y (LL, right single rotation)

y is the left child of z, and x is the right child of y (LR, double rotation first left and then right)

y is the right child of z, x is the right child of y (RR, left single rotation)

y is the right child of z, and x is the left child of y (RL, right and then left double rotation)

Different from the adjustment of the insertion operation, the subtree with z as the root must be balanced first, and then its ancestor nodes can be adjusted, even back to the root node.

Find

As with binary sorting trees, during the search process, the number of keys compared with a given value does not exceed the depth of the tree.

average search length

B-tree

Definition: Also called a multi-way balanced search tree, the maximum number of children of all nodes is called the order of the B-tree.

nature

The keyword of the node ([m/2]-1, m-1) m is the order

Subtree of node (m/2, m)

If the root node is not a non-terminal node, there are at least two subtrees

All leaf nodes are at the same level and carry no information

A multi-way balanced search tree with the balance factors of all nodes equal to 0

high

Excluding the layer where the last leaf node without any information is located

Find

Performing a search is very similar to a binary search tree

After finding a node on the B-tree, first search it in the ordered list. If the search is successful, it will be returned. If it is not successful, it will be searched in the corresponding subtree.

insert

Positioning, the position must be a non-leaf node in the lowest layer

During insertion, the number of keywords in each non-failed node is in the range ([m/2]-1, m-1). If the number of keywords in a node after insertion is less than m, it can be inserted directly; after insertion, the number of keywords in the inserted node is checked. When the number of keywords in the node after insertion is greater than m-1, the node must be split

Split method: Take a new node, insert the key into the original node, and divide the keywords into two parts from the middle position [m/2]. The keywords contained in the left part are placed in the original node, and the keywords contained in the right part are placed in the original node. Partially included keywords are put into new nodes, and the node at the middle position [m/2] is inserted into the parent node of the original node.

delete

Since the number of keywords after deletion may be less than [m/2]-1, the nodes need to be merged.

Two situations

If there are enough brothers, and the number of keywords of the right or left sibling node adjacent to this node is greater than m/2, it is necessary to adjust the right or left sibling node of the node and its parent nodes to achieve a new balance.

If there are not enough brothers to borrow, delete the keyword and merge it with the keywords in the right or left sibling node and parent node.

During the merging process, the number of keywords in the parent node will be reduced by 1. If its parent node is the root node and the number of keywords is reduced to 0, the root node will be deleted directly, and the new node after the merger will Become the root; if the parent node is not the root node and the number of keywords is reduced to [m/2]-2, it will need to be adjusted or merged with their own sibling nodes.

B-tree

nature

Node keyword (m/2, m)

Node subtree (m/2, m)

The number of subtrees of a node is equal to the number of keywords

All leaf nodes contain all keys and pointers to corresponding records

All branch nodes only contain the maximum value of the keywords in each of its child nodes and pointers to its child nodes.

Differences from B-tree

B tree has n keyword nodes and n subtrees. There are n keywords n and 1 subtree in B tree.

The number of keywords in B tree is n (m/2, m), in B tree ([m/2]-1, m-1)

In the B-tree, leaf nodes contain information, and non-leaf nodes only serve as indexes. Each index item in a non-leaf node only contains the maximum key of the corresponding subtree and a pointer to the subtree, and does not contain the key. The storage address of the word corresponding record

In B-tree, leaf nodes contain all keywords, that is, keywords that appear in non-leaf nodes will also appear in leaf nodes. The keywords contained in leaf nodes in B-tree and the keywords contained in other nodes are non-repeating

Each search in the B-tree is a path from the root node to the leaf node, regardless of success or failure.

Hash table

basic concept

Hash function: a function that maps a keyword in a lookup table to the address corresponding to the keyword

conflict

Mapping two or more different keywords to the same address, the different keywords that collide are called synonyms

A well-designed hash function minimizes such collisions

Such conflicts are inevitable, and methods for handling them must be designed.

Hash table: A data structure that is directly accessed based on keywords, establishing a direct mapping relationship between keywords and storage addresses.

Construction method

Notice

The domain of the hash function must include all the keys that need to be stored, and the range of the value domain depends on the size of the hash table or the address range.

The addresses calculated by the hash function should be evenly distributed in the entire address space with equal probability, thereby reducing the occurrence of conflicts.

The hash function should be as simple as possible and can calculate the hash address corresponding to any keyword in a short time.

direct addressing

This method is simple to calculate and does not cause conflicts, and is suitable for situations where the distribution of keywords is basically continuous. If the distribution of keywords is discontinuous and there are many empty spaces, storage space will be wasted.

Divide and leave remainder

The length of the hash table is m. Take a prime number p that is not greater than m but is closest to or equal to m.

The key to this method is to choose p so that each keyword is mapped to any address in the hash space with equal probability after being converted by this function, thereby reducing the possibility of conflicts as much as possible.

digital analysis

Select a number of bits in the keyword that are more evenly analyzed as the hash address

This method is suitable for a consistent set of keywords. If the keywords are changed, a new hash function must be reconstructed.

Find the center of the square

Take the middle digits of the square value of the keyword as the hash address. The specific number of digits depends on the actual situation.

The hash address obtained by this method is related to each bit of the keyword, making the distribution of the hash address even. It is suitable for situations where the values of each bit of the keyword are not even enough or are smaller than the number of digits required for the hash address.

How to deal with conflicts

open addressing method

It means that the free address that can store the new entry is open to its synonym entry and its non-synonym entry. The recursive formula

Four ways to obtain di increment sequence

Linear detection method

When a conflict occurs, the next unit in the table is viewed sequentially until a free unit is found or the entire table is searched.

This method may cause a large number of elements to accumulate at adjacent hash addresses, thereby reducing search efficiency.

square detection method

This method can better resolve conflicts and avoid pile-up situations.

Disadvantages: Not all cells on the hash table can be detected, but at least half of the cells can be detected

double hashing

This method requires the use of two hash functions. When the address obtained by the first hash function H(key) conflicts, the second hash function Hash2(key) is used to calculate the address increment of the keyword.

In the rehashing method, after at most m-1 detections, all positions in the table will be traversed and returned to the H0 position.

pseudorandom sequence

When di=pseudo-random number sequence, it is called pseudo-random sequence method

You cannot delete existing elements in the table at will. If you delete an element, the search address of other elements with the same hash address will be truncated. When you want to delete an element, give it a deletion mark and perform logical deletion; defect: after multiple deletions, there are many positions that appear to be full but are actually vacant. Therefore, regular maintenance is required, and the elements with deletion marks must be physically deleted.

Zipper method (connection method)

Store all synonyms in a linear linked list uniquely identified by their hash address

Search, insert and delete operations are mainly performed in the same word chain

This method is suitable for situations where insertion and deletion are frequent

Find and performance analysis

Find steps

initialization:

Check whether the record Addr exists in the table. If it does not exist, return the search failure; if there is a record, compare it with the value of the key, and if they are equal, return the search success, otherwise proceed to the second step.

Calculate the "next hash address" using the given conflict handling method, set Addr to this address, and go to the previous step

performance

The hash table establishes a direct mirror between the key and the storage location of the record, but due to the occurrence of "conflicts", the search process of the hash table is still a process of comparing a given value with the key. Therefore, it is still necessary to use the average search length as a measure of hash table search efficiency.

depending on

loading factor

Defined as the fullness of a table

The average search length depends on the filling factor of the hash table and does not depend on n,m

The larger the filling factor, the more "full" the filled records are, and the greater the possibility of conflict.

hash function

handling conflicts

sort

basic concept

Sorting: The process of rearranging the elements in a table so that the elements in the table are ordered by keywords

Input: n records R1, R2...Rn, corresponding keywords k1, k2....kn

Output: a rearrangement of the input sequence R1', R2',...Rn', such that k1'<k2'<....<kn'

Algorithm stability

If there are two original equal numbers in the table to be sorted, the order of the two keywords will remain unchanged after the sorting is completed.

The stability of an algorithm does not measure the stability of an algorithm. It mainly describes the nature of the algorithm.

Classification

Internal sorting

During sorting, all elements are stored in memory

Comparison: Determine the context of elements by comparing the sizes of two keywords

Move: Order has been achieved by moving elements

Performance depends on the time complexity (compare and move) and space complexity of the algorithm

external sort

During sorting, all elements cannot be stored in memory at the same time, and elements must be moved between main memory and external memory.

insertion sort

It is a simple and intuitive sorting method. Each time a record to be sorted is inserted into the previously arranged subsequence according to its key size, until all records are inserted.

direct insertion sort

process

Find the insertion position k of L(i) in L[1...i-1]

Move all elements in L[k...i-1] one position later

Copy L(i) to L(k)

Performance analysis

Space complexity O(1)

time complexity

Best: The elements are already in order, and insertion only requires one comparison without moving elements O(n)

Average: The overall number of appropriate moves

Worst: Arranged in exactly reverse order, the number of comparison moves is

The overall time complexity is

Stability: Compare and then move from back to front, and the relative position of the elements will not change. Stable sorting method

Applicability: sequential storage and chain storage

half insertion sort

Separate comparison and operation, first find the position to be inserted by folding in half, and then move all elements after the position to be inserted uniformly.

Performance analysis

time complexity

The number of comparisons has nothing to do with the initial state to be sorted. It is related to the number of elements in the table. The number of comparisons of elements is reduced to

The number of moves has not changed and depends on the initial state of the list to be sorted.

The overall time complexity is

Applicability: Sequential storage of linear tables with a small amount of data

Stability: It is a stable sorting method

Hill sort

Basic idea: Divide the list to be sorted into several special word lists, in the shape of L[i,i d,i 2d,...,i ik], and form a word list with records separated by a certain "increment". Direct insertion sorting is performed on each word table. When the elements in the table are basically in order, direct insertion sorting is performed on the entire record.

Process: Take a step size d1 smaller than n, divide all records in the table into d1 groups, put all records whose distance is a multiple of d1 into the same group, perform direct insertion sorting within each group, and then take the second step size d2 <d1, then repeat the above process until di=1. At this time, all records are in the same group, and then directly insert and sort.

Performance analysis

Space complexity O(1)

time complexity

Functions that depend on increment sequences

The time complexity of n in a certain range is

Worst case time complexity

Stability: When records with the same keyword are divided into different word lists, their relative order may change, which is an unstable sorting algorithm.

Only applicable when the linear table is stored sequentially

swap sort

Swap: Swap the positions of the two records in the sequence based on the comparison results of the keywords of the two elements in the sequence.

Bubble Sort

Basic idea: compare the values of adjacent elements from back to front (or from front to back). If it is in reverse order, swap positions until the sequence is compared. As a result, the smallest or largest element will be placed in the final position, up to n- One bubble can sort all elements

Performance analysis

Space complexity O(1)

time complexity

Number of comparisons

Number of moves

average time complexity

Stability: Since i>j and A[i]=A[j], no exchange will occur, which is a stable sorting method.

The generated ordered subsequence is globally ordered, and each sorting operation will place an element at the final position.

Quick sort

Basic idea: Take any element pivot in the list to be sorted as the pivot, divide the sorted list into two independent parts through one sort, and place the pivot in its final position, and the elements to the left of it are smaller than it. , the elements on the right side are all greater than it, it is a quick sort, repeat this process until each sub-table has only one element

Performance analysis

space efficiency

Quick sort is recursive and requires a recursive work stack to save the necessary information for each level of recursive calls.

most

Worst O(n)

average

time efficiency

worst

most

It is the algorithm with the best average performance among all internal sorting algorithms.

Stability: If there are two keywords in the right end range that are the same and smaller than the benchmark value, their relative positions will change when swapped to the left end range, which is an unstable sorting method.

During the sorting process, no ordered subsequence is generated, but the reference element is placed at its final position after each sorting pass.

selection sort

Each pass selects the element with the smallest keyword among n-i 1 elements to be sorted, the i-th element of the optimal ordered subsequence, until the n-1th pass is completed, and there is only one element left to be sorted.

Simple selection sort

Basic idea: The sorting table is L[1...n]. The i-th sorting selects the element with the smallest keyword from L[i...n] and exchanges it with L(i). Each sorting can determine a The final position of the element, after n-1 times, the entire sorting list can be ordered.

Performance analysis

Space efficiency O(1)

time complexity

Stability: It is an unstable sorting method

Heap sort

Heap definition: satisfies L(i)>=L(2i) and L(i)>=L(2i 1) -- large root heap or L(i)<=L(2i) and L(i)<=L( 2i 1)--Small root heap

Sorting idea: The n elements stored in L[1...n] are referred to as the initial heap. Taking the large root heap as an example, the top element of the heap is the maximum value. After the top element of the heap is output, the bottom element of the heap is sent to the top of the heap. , at this time the root node does not meet the properties of a large root heap, the top element of the heap is adjusted downward, and then the top element of the heap is output until there is only one element left in the heap.

Key: Construct the initial heap. A complete binary tree with n nodes, the last node is the child of the [n/2]th node. Filter the subtree with the [n/2]th node as the root, so that the subtree becomes a heap, and then filter forward accordingly.

Performance analysis

Space complexity O(1)

time complexity

Stability: It is an unstable sorting method

Suitable for situations with many keywords

merge sort

Basic idea: combine two or more ordered lists into a new ordered list. Assuming that the list to be sorted contains n records, it can be regarded as n ordered word lists, with the length of each sublist is 1, and then merge them two by two until they are merged into an ordered list of length n.

Performance analysis

Space complexity O(n)

time complexity

Stability: It is a stable sorting method

When performing k-way merge sorting on N elements, the number of sorting passes m satisfies

Radix sort

It is a method that does not sort based on comparison and movement, but sorts the size of each keyword.

Implementation

Highest bit first: Divide the keywords into several smaller subsequences layer by layer by decreasing the weight, and finally connect them.

Lowest bit first: Sort in order of increasing keyword weight, and finally form an ordered queue.

Performance analysis

Space complexity O(r) r: number of queues used

Time complexity: d passes of allocation and collection, one pass of allocation O(n), one pass of collection O(r),

Stability: It is a stable sorting algorithm

Performance analysis and applications

Algorithm selection

If n is small, insert it directly or simply select it. The record itself has a large amount of information, so simple selection is better.

The initial state keywords are basically ordered: direct insertion or bubbling

Larger n: quick sort (shortest average time), heap sort (less auxiliary space) and merge sort (stable algorithm)

When the n keywords of a file are randomly distributed, with the help of the "comparison" sorting algorithm, at least

n is very large, the number of recorded keywords is small and can be decomposed, and radix sorting is used.

The record itself contains a large amount of information. To avoid spending a lot of time moving records, a linked list can be used as a storage structure.