Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

MindMap Gallery Complete knowledge of algorithm data structures

Complete knowledge of algorithm data structures

Idea: The basic idea of the greedy algorithm is to find the optimal solution for each small part of the whole, and combine all these local optimal solutions to form an optimal solution for the whole

Edited at 2022-05-13 22:47:51

PlotWizard

Recent works View more works>>

Complete knowledge of algorithm data structures

PlotWizard

Recent works View more works>>

Recommended to you
Outline

Complete knowledge of algorithm data structures

Basics Part 2

Why does redis must use jump tables to implement ordered collections?

Skip list: The structure of linked list plus multi-level index is skip list, a dynamic data structure that supports fast insertion, deletion and search operations.

Use the dichotomy method to achieve efficient query of linked lists, the time complexity is Ologn

By exchanging time for space, you can obtain higher efficiency. You can reduce the space complexity by reducing the density of index nodes. For storage objects, the index memory overhead can be ignored.

Efficient dynamic insertion and deletion

Insertion requires querying Ologn, insertion time O1, deletion requires querying Ologn, deleting nodes, and deleting indexes.

Dynamic update of skip table index

In extreme cases, the skip list may degenerate into a singly linked list.

When inserting data, k values are generated through a random function, and nodes are added to the k-level index to maintain "balance"

Why use skip tables instead of red-black trees?

Skipping tables to find data by interval is more efficient than red-black trees

Jumping is more flexible. It can effectively balance execution efficiency and memory consumption by changing the index construction strategy.

Hash table: How to implement document word spelling function?

Hash collision

open addressing method

Calculate the storage location based on the hash function. If it already exists, search for the empty space in sequence.

Search is the same, according to the mapping, if not found, search in sequence

Delete elements: By marking them as deleted, they will be skipped when the mark is encountered during the query.

Disadvantages: large data, many conflicts, long detection time, extreme situations On

The above is a linear detection method, as well as secondary detection (the conflict detection subscript increment is square, 1, 4, 9, etc.), double hashing

Measuring the amount of conflicts: load factor = number of loaded elements/hash table length

linked list method

Hash again

Construct multiple hash algorithms and calculate conflicts again

Create a public overflow area

Divide the hash table into basic tables, overflow out of the table, conflict with the basic table, fill in the overflow table

How to design an efficient hash function

The hash function should not be too complex and the generated values should be as random and evenly distributed as possible

Solution: Analyze the characteristics of the data and select a random data part as the key

Direct addressing, square centering, folding method, random number method, ASCLL code

The load factor cannot be too large, expand the capacity, and reduce the capacity when the data is small.

When the amount of data is large, it is not possible to move the data at once

If the data exceeds the threshold, apply for space.

When inserting data, insert it into a new hash table, and at the same time take a value from the old hash table and add it to the new hash table.

Choose a conflict resolution

Open addressing applicable scenarios

When the amount of data is small and the load factor is small, ThreadLocalMap uses open addressing to resolve hash conflicts.

linked list method

It is more suitable for storing objects and hash tables with large amounts of data. It is more flexible and supports more optimization strategies, such as red-black trees instead of linked lists.

Design requirements

Supports fast query, insertion, and deletion operations

Unreasonable memory usage

The performance is stable and in extreme cases, the hash table performance cannot degrade to an unacceptable level.

Solution

Design a suitable hash function

Define load factor thresholds and design dynamic expansion

Choosing an appropriate hash collision resolution method

Hash tables and linked lists used together

LinkedHashMap

Implemented by hash table, singly linked list, double linked list

How to find, how to delete, how to add

The addition operation is consistent with the LRU algorithm

Hash table dynamic data, efficient data insertion, deletion, and search, but cannot support fast sequential traversal

hash algorithm

The hash algorithm needs to satisfy

Data cannot be deduced backwards

Sensitive to the input data, if one bit is changed, the hash value will be different

Hash collisions are small

Execution efficiency must be efficient

Application path

Security encryption, data verification, hash function, load balancing, data sharding, distributed storage

What are the applications of hash in distributed systems?

Load balancing processing method: Calculate the hash value of the client IP address or session ID through the hash algorithm, and then take the modulo to calculate the size of the server list, so that routing to the server through the IP address remains unchanged.

Data fragmentation: Count the number of occurrences of "search keywords" in 1T log files, and find out whether a picture book exists in the library

Distributed storage-one-time hash algorithm

Binary tree

definition

Node height, node depth, node level, tree height, complete binary tree, full binary tree

storage

Chained storage, sequential storage

Arrays can be used to store a complete binary tree, which is the most space-saving

Binary tree recursive traversal

Search, delete, add in binary search tree

The deletion operation involves three possibilities: the deleted node is a leaf node, the deleted node has two nodes, and the deleted node has one child node.

How to solve the problem of identical keys in binary trees

Store the same value on a node through a linked list or an array that supports dynamic expansion.

Directly treat it as a number greater than the node and store it in the right leaf

Why should we use binary trees instead of hash tables instead of binary trees?

1. The hash table data is stored out of order. In the binary search tree, only in-order traversal is needed to achieve ordering of the data.

2. Hash expansion takes a lot of time, and the performance is unstable when encountering hash conflicts. In the project, a balanced binary tree is used to solve the unstable performance, so that the time complexity is stable at Ologn

3. The design of the hash table is complicated, there are many situations to consider, and space is wasted due to the loading factor.

red black tree

definition

General theory: The height difference between the left and right trees of any node in a binary tree cannot be greater than 1

Narrow theoretical requirements: In red-black trees, there is no guarantee that the height difference between the trees is 1, as long as the height of the tree is Ologn

The color of the root node is black

Each leaf is a black empty node, and the leaf node does not store data.

There cannot be two consecutive red nodes from a node to a leaf node.

Every node, and all paths from that node to its reachable leaf nodes, contain the same black node

Balance: Make the tree look symmetrical and balanced on the left and right, ensuring that the height of the entire tree is relatively low, and the efficiency of inserting, deleting, and searching is high.

recursive tree

Solve recursive time complexity analysis

Heap sort

Definition: A heap is a complete binary tree

Characteristics: The value of a node is greater than or equal to the value of its child nodes

Two steps of heap sorting: building heap and sorting

Insert elements: stack from bottom to top

Delete elements: stack from top to bottom

Complexity: The heap building time complexity is On, sorting is Onlogn, and the overall complexity is: Onlogn

Lots of applications

priority queue

The queue is dequeued successively according to the priority. You can use the heap to implement the priority queue.

Huffman coding, shortest path of graph, minimum spanning tree

timer

Use the heap to store scheduled tasks and calculate the time interval between the top of the heap and the next scheduled task, without the need for scheduled polling

Get top 100

Use the heap to maintain a small top heap of 100 elements

Static data: directly compare the element with the top of the heap. If it is less than the top of the heap, it will not be processed. If it is greater than the top of the heap, delete the top of the heap and then heap it.

Dynamic data: If there are two operations: adding data and accessing the current top 100 data, the heap can be maintained

Get the median and 99% response time of the statistical program interface

Representation of graph

Like a binary tree, it is a nonlinear structure

Vertices, edges, degrees, in-degrees, out-degrees, weighted graphs, adjacency matrices, adjacency lists

Advantages and disadvantages of adjacency matrix and adjacency list

Adjacency matrix: It wastes space in storage, has high query efficiency, and facilitates matrix operations.

Adjacency list: Each vertex is stored in a linked list, which saves space and is inconvenient to search. The query efficiency is not as high as the adjacency matrix, but it can be optimized through balanced binary trees, skip tables, and hash tables.

Depth and breadth first search (six-dimensional space theory)

breadth first

Time complexity OE (number of edges)

Space complexity OV (number of vertices)

depth first

Same as above

Breadth first uses queue implementation, depth first uses stack implementation

string matching

BF (Brute Force violent matching)

The main string and the pattern string, the pattern string and the main string match one by one

Time complexity (On*m)

RK (Rabin-Karp)

Use the hash algorithm to calculate the hash value of all substrings of the string and compare it with the hash value of the pattern string

The time complexity is On, which requires On to process the hash value and On to compare.

BM (Boyer-Moore)

Relatively complex string matching, but has high matching efficiency and is widely used in text editors

The core idea is that when the pattern string does not match the main string, the pattern string slides back a few more places to reduce unnecessary character matching.

There are two main construction rules: bad character rules and good suffix rules

KMP

It is very similar to the BM algorithm. When bad characters are encountered, the pattern string is moved a few more places.

Time complexity O(n m)

AC automaton

Improved algorithm based on Trie tree, failure pointer exists

Build steps

Building an AC automaton

Build Trie tree

Build failure pointer

Match main string in AC automaton

Trie tree

Also known as dictionary tree: a tree structure, a data structure that specializes in string matching, to quickly find a string

Time complexity of searching string in Trie tree: 0(n)

Trie tree is an idea of exchanging time for space, which consumes a lot of space.

Solution: use hash table, red-black tree

Trie has many string requirements for processing

String containing characters cannot be too large

Requires string prefixes to overlap more

Implement Trie logic yourself

The pointer is discontinuous for data blocks, and performance will be affected.

application

Automatic input completion, input method automatic completion, IDE compiler automatic completion

greedy algorithm

Huffman compression coding, minimum spanning tree, single source shortest path, knapsack problem

Difficulty: Abstract the problem into a greedy algorithm model

Idea: The basic idea of the greedy algorithm is to find the optimal solution for each small part of the whole, and combine all these local optimal solutions to form an optimal solution for the whole

Scope of application

The overall optimal solution can be found through local optimal solutions

.A whole can be divided into multiple parts, and optimal solutions can be found for these parts.

step

Starting from an initial solution to the problem

Using loop statements, when a step can be taken towards the solution goal, a partial solution is obtained based on the local optimal strategy to reduce the scope or scale of the problem.

Combine all partial solutions to get the final solution to the problem

getting Started

Algorithm function

Write better performing code

Algorithms, expand your way of thinking

Can train the brain’s thinking ability

Conducive to framework reading and design thinking

Difficulty learning algorithms

The project involves less

easy to forget

Unable to use flexibly

The relationship between algorithms and data structures

Data structures serve algorithms, and algorithms act on specific data structures.

how to learn

Think more, do more, practice while learning, write repeatedly, ask more questions, interact more

First, master algorithm complexity analysis, work hard, and balance time and space resources.

You need to be able to concentrate on learning, and knowledge needs to be continuously accumulated

The method of killing monsters and upgrading, and constantly cultivating your own interests and experience

Write study notes for each article and implement the code of the week once a week

The origin, characteristics, applicable scenarios of learning algorithms, and the problems it can solve

Learning purpose

Establish awareness of time complexity and Confucian complexity, and write high-quality code

Able to design infrastructure, improve programming skills, and train logical thinking

Improve the depth of looking at problems and develop angles to solve problems

Recommended books

"Algorithm 4th Edition Java"

"Sword Finger Offer"

"Programming Jewels"

"The Beauty of Programming"

"Data Structure and Algorithm Analysis"

"Introduction to Algorithms"

Basics Part 1

Complexity analysis

How to analyze and count algorithm execution efficiency and resource consumption

What is complexity analysis

Data structure and algorithm solving is "how to make computers solve problems faster, more time-saving, and more space-saving"

Therefore, the performance of data structures and algorithms needs to be evaluated from the two dimensions of execution time and space occupied.

Performance problems are described using the two concepts of time complexity and space complexity respectively, both of which are collectively referred to as complexity.

Complexity describes the growth relationship between the execution time (or space occupied) of an algorithm and the size of the data.

Necessity of complexity analysis

Compared with performance testing, complexity analysis has the characteristics of not relying on the execution environment, low cost, high efficiency, easy operation, and strong guidance.

Mastering complexity analysis will enable you to write code with better performance, which will help reduce system development and maintenance costs.

How to perform complexity analysis

Big O notation

The execution time of the algorithm is proportional to the number of executions of each line of code, represented by T(n) = O(f(n)), where T(n) represents the total execution time of the algorithm, and f(n) represents the total execution time of each line of code. times, and n often represents the size of the data.

Take time complexity as an example. Since time complexity describes the growth trend of algorithm execution time and data size, constant order, low order and coefficients do not actually have a decisive impact on this growth trend. Therefore, when doing time complexity These terms are ignored during degree analysis.

Common complexity levels

polynomial order

O(1) (constant order), O(logn) (logarithmic order), O(n) (linear order), O(nlogn) (linear logarithmic order), O(n^2) (square order), O(n^3) (cubic order)

non-polynomial order

O(2^n) (exponential order), O(n!) (factorial order)

Logarithmic order derivation formula

log3n is equal to log32 * log2n derived by changing the base formula

2x=n: Break out of the loop when looping x times >=n

Time complexity analysis

Best, worst, average, and amortized time complexity

average time complexity

If the complexity of the code differs in magnitude under different circumstances, it is represented by the weighted average of the number of times the code is executed under all possible circumstances.

Amortized time complexity

The code is low-level complexity in most cases, and high-level complexity in only a few cases;

There is a temporal pattern to the emergence of low and high levels of complexity. The amortized result is generally equal to the low-level complexity.

Why do arrays start numbering from 0?

How to implement random access

linear table

Continuous memory space and data of the same type

Array out of bounds problem

In Java, depending on the compiler, -fno-stack-protector (stack protection function) is enabled by default in gcc. Whether i is defined before or after, it will be pushed onto the stack after array.

Insertion and deletion optimization

When the array is unordered and requires insertion, the current element can be replaced and the current element can be added at the end.

When deleting data, avoid frequently moving data. You can mark the deleted elements and delete the data when there is insufficient space. According to the garbage collection mechanism

Array advantage

Collection framework cannot store basic data types

If you know the size of the data and the data operation is simple, you can use an array.

Why do computers use 0

Reduce instruction calculations

a[k]_address = base_address K*type_size, if starting from 1 k-1

How to implement LRU (Least Recently Used) cache elimination algorithm

There are two cases of deleting nodes in the linked list. The execution efficiency of single linked list and multi-linked list is different.

Delete a node whose value is equal to a certain value

Delete the node pointed to by the given pointer

Programming optimization ideas

Optimize by exchanging space for time

Arrays are simple and easy to use, and are friendly to CPU caching, because linked lists are not stored continuously, and the CPU caches data blocks.

Implement LRU

The data exists in the cache and needs to be deleted and added to the head of the queue.

Not in cache

The cache is not full, insert into the head

When the cache is full, delete the tail node and insert a new node into the head.

Single linked list implementation of palindrome string idea

Practice LetCode 206, 141, 21, 19, 876

Better written linked list algorithm

Understand the meaning of pointers or references

Be wary of lost pointers and memory leaks

Simplifying implementation difficulty using Sentinels

Pay attention to the processing of boundary conditions

Linked list is empty

The linked list has only one node

The linked list has two nodes

The code logic is processing the head node and tail node

Drawing to aid thinking

Practice more, use your hands more

Common operations on linked lists

Reverse singly linked list

Speed pointer, forward or backward part reversal

Linked list ring detection

Determine the cycle in a singly linked list

Traverse to determine whether nodes are repeated, fast and slow pointers

The exit of the judgment ring

The distance a from the head of the linked list to the entry point is equal to the distance from the meeting point of slow and fast to the entry point.

Find the number of nodes on the ring

Start from the meeting point, traverse it again, and return to the starting point

Merge two ordered linked lists

Compare two pointers, sort

Delete the last n nodes of the linked list

Move the two pointers equidistantly to the right and traverse twice length-k 1. Reverse the linked list

Find the middle node of the linked list

speed pointer

Using stack to implement browser rollback function

Stack: A linear list with limited operations, only allowing insertion and deletion of data at one end

Implementation method: sequential stack, chain stack

the complexity

Time complexity, best case 1, worst case n

Amortized time complexity 1

For temporary variable storage, the operating system allocates independent memory space and uses local variables in the stack storage method.

Application in four calculations

Implemented using two stacks, one to store operands and one to store symbols. When the symbol is pushed into the stack, the symbol in the stack is compared. If the priority is lower than the symbol in the stack, the operation is popped out of the stack.

Use stack matching brackets

memory space

code area

Store the binary code of the method body, high-level (job) scheduling, intermediate (memory scheduling), low-level (process scheduling), control the switching of execution code in the code area

static data area

Store global variables, static variables, and constants. Constants include final-modified constants and String constants. The system automatically allocates and recycles.

dynamic data area

stack area, heap area

Stack: stores formal parameters, local variables, and return values of running methods. Automatically allocated and recycled by the system

Heap: The reference or address of a new object is stored in the stack area, pointing to the real data of the object stored in the heap area.

Letcode programming exercises

20,155,232,844,224,682,496

The use of queues in limited resource pools such as thread pools

Queue: A linear table with limited operations, only allowing insertion at one end and deletion at the other end

There are also two ways to implement

Sequential queue implemented by array, chained queue implemented by linked list

Based on the fact that the array queue needs to be expanded after its capacity is full. If there is still free space in front of the head and tail ==n cannot add elements, it needs to be expanded.

Data can be copied, time complexity n

Move the data directly and move the array sequentially, the time complexity is only 1

Use circular linked list to solve this problem

Empty and full judgment

Array implementation

Empty head==tail Full tial==n

Linked list implementation

Empty tail==head Full (tail 1)%n=head

blocking queue

Applied in the producer-consumer model

concurrent queue

Regarding thread safety, using concurrent queues, you can directly lock enqueue() and dequeue(). However, the lock granularity is large and the concurrency is relatively low.

You can use array-based circular queues and use CAS atomic operations to achieve efficient concurrency.

Task request thread resources, database connection pool

Unbounded queue based on linked list

Too many tasks, too long waiting time, not suitable

Array-based bounded queue

Request exceeds queue size, request rejected

Mainly to find the most reasonable queue size

Recursion: How to find the "final recommender" with three lines of code?

Recursion needs to meet three conditions

A problem can be broken down into multiple problems

After decomposing into multiple problems, except for the different scales, the solution ideas are the same.

There is a recursive termination condition

Prevent stack overflow

Solution: 1. Convert to non-recursive 2. Use static instead of nostatic 3. Increase the stack size value

Be wary of double counting

You can use a hash table to store the results of each recursion

Disadvantages of recursion

Required time complexity, large space complexity, stack overflow, and repeated calculations

Rewrite recursive code and change recursive code to non-recursive code

Recursive debugging methods

Print logs to discover recursive values and debug using conditional breakpoints

Why insertion sort is more popular than bubble sort

Wide variety of sorting

The most commonly used: bubble sort, insertion sort, selection sort, merge sort, quick sort, counting sort, radix sort, bucket sort.

Such as monkey sorting, sleep sorting, noodle sorting, etc.

How to analyze a sorting algorithm

Algorithm execution efficiency

Best, worst, and average case time complexity

Coefficients, constants, low-order of time complexity

The number of comparisons and exchanges

Memory consumption of sorting algorithm

The stability of sorting algorithms

Stable sorting algorithm can keep two objects with the same amount, and the order remains unchanged after sorting (applied to sorting by order amount, the same amount is sorted according to the order time)

Bubble Sort

In-place sorting algorithm, adjacent equal elements are not exchanged, complexity is best 0n and worst is 0n2

Probabilistic method analysis time complexity analysis

Analysis through the concept of "degree of order and degree of reverse order"

Order degree: the number of ordered pairs in the array, reverse order degree = full order degree - order degree

Worst case: the order degree is 0, n*(n-1)/2 exchanges are required, the average is n*(n-1)/4 exchanges, and the average time complexity is 0n2

insertion sort

In-place sorting is a stable sorting algorithm. The best time complexity is On, the worst is On2, and the average complexity is On2.

selection sort

In-place sorting is an unstable sorting. The best time complexity is On2, the worst is On2, and the average complexity is On2.

It is precisely because the sorting is unstable that bubble sorting is less efficient than insertion sorting.

Insertion sort vs bubble sort comparison

Bubble sort requires three assignments to exchange data, while insertion sort only requires one assignment to move data.

Quick sort and merge sort

Merge sort, select the middle point

Whether the merge is stable depends on whether the position of the same elements changes during the merge.

Time complexity is: Onlogn, space complexity On

Quick sort, randomly select points, and then place them on both sides

Quick sort is an unstable sort, with space complexity O1 and time complexity Onlogn. In extreme cases, On2 will occur, so quick sort is generally used.

Both adopt the divide-and-conquer idea and are implemented through recursion. Quick sort understands the recursive formula and merge function, and quick sort understands the recursive formula and partition function.

Quick sort, recursive sort, find the kth largest number

Bucket sort (On), counting sort (On k), radix sort (Odn)

bucket sort

Applicable scenarios: Suitable for use in external sorting, for example, when 10G data needs to be read and the memory is only 300M

counting sort

It is suitable for situations where the granularity is smaller than bucket sorting, the range is small, and it is a positive integer.

Radix sort

Suitable for situations where the range is large and there are gradient levels, such as comparing phone numbers.

How to implement a general-purpose, high-performance sorting function

The time complexity of linear sorting is relatively low, but the applicable scenarios are quite special. If you need to write a general sorting, you cannot choose linear sorting.

Why is quick sort often used?

Since the time complexity is low and the space complexity is On, it is only the worst case On.

Optimization method: three-number elimination, random method

The time complexity of recursive sorting is low, but the space complexity is On

Application-level sorting: achieved through a mix of sorting methods

qsort implementation in c language

First use merging, exchanging space for time. When the amount of data is too large, use quick sorting.

Quick sorting is implemented using the method of taking the middle of three numbers

In quick sort, insertion sort is used when the sorting interval is less than 4

For small-scale sorting, On2 does not necessarily take longer to execute than Onlogn. If the amount of data is small, you can choose insertion sort, or use sentinels to simplify the code and optimize performance.

binary search

How to use the most memory-saving way to achieve fast search function?

Binary search is highly efficient and has a time complexity of Ologn

An ordered array data structure is required, suitable for situations where the scale is relatively large, queries are frequent, and there are no frequent data insertion or deletion operations.

Pay attention to the exit of the loop condition, the value of mid, and the update of low and high

Binary search can be implemented using loops and recursion

Hash tables and binary trees can solve the problem of quickly searching dynamic data structures, but they require relatively large additional memory space.

leetcode 33

Variation of binary search for duplicate situations

Query the first occurrence of the value

Query the last occurrence of the value

Query the first number greater than a specific value

Query the last number smaller than a specific value