MindMap Gallery traditional neural network
Review some knowledge points of traditional neural networks for machine learning, including nonlinear activation functions, the concept of gradient, the concept of linear regression, linear regression application scenarios and limitations, the structure of neural networks, etc.
Edited at 2022-11-23 09:35:21This is a panoramic infographic—currently sweeping across the web—illustrating the comprehensive applications of OpenClaw, a popular open-source AI agent platform. It systematically introduces this intelligent agent framework—affectionately dubbed "Lobster Farming"—helping readers quickly grasp its core value, technical features, application scenarios, and security protocols. It serves as an excellent introductory guide and practical manual.
這是一張最近風靡全網關於熱門開源AI代理平台OpenClaw的全網應用全景圖解。它系統性地介紹了這款被稱為「養龍蝦」的智慧體框架,幫助讀者快速理解其核心價值、技術特性、應用場景及安全規範,是一份極佳的入門指南與實操手冊。此圖主要針對希望利用AI建構自動化工作流程的技術從業人員、中小企業主及效率追求者,透過9大模組層層遞進,全面剖析了OpenClaw從概念到落地的整個過程。 圖中核心內容首先釐清了「養龍蝦」指涉的是OpenClawd開源智能體,並強調其本質是「AI基建」而非一般聊天機器人。隨後詳細比較其與傳統AI助理的區別,擁有記憶管理、權限控制、會話隔離和異常恢復四大基礎能力,支援跨平台存取和多模型相容(如GPT、Claude、Ollama)。同時,圖解提供了完整的部署方案(雲端/本地/Docker),並列舉了辦公室自動化、內容創作、資料收集等五大應用程式場景。此外,還展示了其火爆程度、政府與大廠佈局、安全部署建議及適合/不適合的人群分類。幫助你快速掌握OpenClaw技術架構與應用價值,指導個人或企業建構AI自動化系統,規避資料外洩與權限失控風險,是學習「執行式AI」轉型的權威參考圖譜。
本圖由萬興腦圖繪製,是針對IT研發崗位的結構化個人履歷模板,完整涵蓋求職核心資訊模組。基本資訊區包含姓名、電話、信箱、求職意願及GitHub連結;專業概要要求以2-3句提煉核心優勢;工作經驗以「公司A高級Java開發工程師」為例,以「透過(行動),達成(量化成果)」格式呈現微服務架構設計、系統效能優化、團隊技術規範制定等職責,公司B經歷則聚焦功能模組開發與Elasticsearch搜尋優化;技能專長分程式語言、後端框架、中介軟體、資料庫、容器雲等維度,清楚展示技術堆疊;專案成果以「電商平台秒殺系統」為例,說明技術棧、架構設計、個人貢獻(Redis Lua庫存原子扣減)及KPI;教育背景包含一流大學電腦專業學歷,以及AWS認證解決方案架構師、軟考中級軟體設計師證書。模板邏輯嚴謹,涵蓋IT研發求職全流程關鍵訊息,幫助求職者清晰、量化展示專業能力。
This is a panoramic infographic—currently sweeping across the web—illustrating the comprehensive applications of OpenClaw, a popular open-source AI agent platform. It systematically introduces this intelligent agent framework—affectionately dubbed "Lobster Farming"—helping readers quickly grasp its core value, technical features, application scenarios, and security protocols. It serves as an excellent introductory guide and practical manual.
這是一張最近風靡全網關於熱門開源AI代理平台OpenClaw的全網應用全景圖解。它系統性地介紹了這款被稱為「養龍蝦」的智慧體框架,幫助讀者快速理解其核心價值、技術特性、應用場景及安全規範,是一份極佳的入門指南與實操手冊。此圖主要針對希望利用AI建構自動化工作流程的技術從業人員、中小企業主及效率追求者,透過9大模組層層遞進,全面剖析了OpenClaw從概念到落地的整個過程。 圖中核心內容首先釐清了「養龍蝦」指涉的是OpenClawd開源智能體,並強調其本質是「AI基建」而非一般聊天機器人。隨後詳細比較其與傳統AI助理的區別,擁有記憶管理、權限控制、會話隔離和異常恢復四大基礎能力,支援跨平台存取和多模型相容(如GPT、Claude、Ollama)。同時,圖解提供了完整的部署方案(雲端/本地/Docker),並列舉了辦公室自動化、內容創作、資料收集等五大應用程式場景。此外,還展示了其火爆程度、政府與大廠佈局、安全部署建議及適合/不適合的人群分類。幫助你快速掌握OpenClaw技術架構與應用價值,指導個人或企業建構AI自動化系統,規避資料外洩與權限失控風險,是學習「執行式AI」轉型的權威參考圖譜。
本圖由萬興腦圖繪製,是針對IT研發崗位的結構化個人履歷模板,完整涵蓋求職核心資訊模組。基本資訊區包含姓名、電話、信箱、求職意願及GitHub連結;專業概要要求以2-3句提煉核心優勢;工作經驗以「公司A高級Java開發工程師」為例,以「透過(行動),達成(量化成果)」格式呈現微服務架構設計、系統效能優化、團隊技術規範制定等職責,公司B經歷則聚焦功能模組開發與Elasticsearch搜尋優化;技能專長分程式語言、後端框架、中介軟體、資料庫、容器雲等維度,清楚展示技術堆疊;專案成果以「電商平台秒殺系統」為例,說明技術棧、架構設計、個人貢獻(Redis Lua庫存原子扣減)及KPI;教育背景包含一流大學電腦專業學歷,以及AWS認證解決方案架構師、軟考中級軟體設計師證書。模板邏輯嚴謹,涵蓋IT研發求職全流程關鍵訊息,幫助求職者清晰、量化展示專業能力。
traditional neural network
nonlinear activation function
sigmoid
advantage
Compress input feature values in a wide range to between 0 and 1, so that the data amplitude can be maintained without major changes in deep networks
Closest to biological neurons in a physical sense
Depending on its output range, this function is suitable for models that have predicted probabilities as output
shortcoming
When the input is very large or very small, the output is basically constant, that is, the change is very small, which causes the gradient to be close to 0.
Gradients may disappear prematurely, resulting in slower convergence
Exponential operations are relatively time-consuming
The output is not 0-mean, which causes the neurons in the next layer to get the non-0-mean signal output by the previous layer as input. As the network deepens, the distribution trend of the original data will change.
tanh
advantage
Solve the problem that the output of the above Sigmoid function is not 0 mean
The derivative of the Tanh function ranges from 0 to 1, which is better than the 0 to 0.25 of the sigmoid function, which alleviates the problem of vanishing gradients to a certain extent.
The Tanh function is similar to the y=x function near the origin. When the input activation value is low, matrix operations can be performed directly, and training is relatively easy.
shortcoming
Similar to the Sigmoid function, the vanishing gradient problem still exists
Observe its two forms of expressions, namely 2*sigmoid(2x)-1 and (exp(x)-exp(-x))/(exp(x) exp(-x)). It can be seen that the problem of power operation still exists
ReLU
advantage
Compared with the sigmoid function and the Tanh function, when the input is positive, the Relu function does not have a saturation problem, which solves the gradient vanishing problem and makes the deep network trainable.
The calculation speed is very fast, you only need to determine whether the input is greater than 0 value
The convergence speed is much faster than sigmoid and Tanh functions
Relu output will cause some neurons to have a value of 0, which not only brings network sparsity, but also reduces the correlation between parameters, which alleviates the problem of overfitting to a certain extent;
shortcoming
The output of the Relu function is not a function with 0 as the mean.
There is a Dead Relu Problem, that is, some neurons may never be activated, causing the corresponding parameters to never be updated. The main reasons for this problem include parameter initialization problems and learning rate settings that are too large;
When the input is a positive value and the derivative is 1, in the "chain reaction", the gradient will not disappear, but the strength of the gradient descent depends entirely on the product of the weights, which may lead to the gradient explosion problem
Leaky ReLU
advantage
In response to the Dead Relu Problem that exists in the Relu function, the Leaky Relu function gives the input value a very small slope when the input is a negative value. On the basis of solving the 0 gradient problem in the case of negative input, it is also well alleviated. Dead Relu issue
The output of this function is from negative infinity to positive infinity, that is, leaky expands the range of the Relu function, where the value of α is generally set to a smaller value, such as 0.01
shortcoming
Theoretically, this function has better effects than the Relu function, but a large amount of practice has proved that its effect is unstable, so there are not many applications of this function in practice.
Inconsistent results due to different functions applied in different intervals will result in the inability to provide consistent relationship predictions for positive and negative input values.
The concept of gradient
The original meaning of gradient is a vector (vector), which means that the directional derivative of a certain function at this point reaches the maximum value along this direction, that is, the function changes fastest along this direction (the direction of this gradient) at this point, and the change is The rate is the largest (the module of the gradient).
The concept of linear regression
Linear relationship to describe the mapping relationship from input to output
Linear regression application scenarios
Network analysis, risk analysis, stock price prediction, weather forecast
Limitations of linear regression
Linear regression can clearly describe the segmentation of linearly distributed data, but is weak in describing nonlinearly distributed data.
The structure of neural network
input layer
activation value
middle layer
output layer
Weight: refers to the close relationship with a neuron in the input layer. The closer the connection, the greater the value.
Activation value: The activation value of the output layer is calculated. The simple calculation is to multiply the activation value of the input layer by the weight.
Offset: Don’t worry about this parameter for now
“Parallel” and “Series” Connection of Neurons
Here, m represents the width of the nth layer of neural network, and n is the depth of the current neural network.
From the first layer of neural network to the final output, the value of each neuron is determined by the neuron value of the previous layer, the neuron parameters W, b and the excitation function. The equation of the k-th neuron in the n-th layer can be expressed by the formula:
Loss function-Loss
One of the most important factors affecting deep learning performance. It is the external world that affects the nerves Direct guidance for network model training
An appropriate loss function can ensure the convergence of the deep learning model
Designing an appropriate loss function is one of the main contents of research work
Softmax function definition and its benefits
normalized exponential function
Convert prediction results to non-negative numbers
The first step of softmax is to transform the prediction results of the model into an exponential function, thus ensuring the non-negative nature of the probability.
The sum of the probabilities of various predicted outcomes is equal to 1
The method is to divide the converted results by the sum of all converted results, which can be understood as the percentage of the converted results in the total. This gives approximate probabilities.
Definition of Cross entropy function and its benefits
Why it can be used as a loss function
Cross entropy can be used as a loss function in neural networks (machine learning). p represents the distribution of real labels, and q is the predicted label distribution of the trained model. The cross entropy loss function can measure the similarity between p and q.
Another benefit of cross entropy as a loss function is that using the sigmoid function during gradient descent can avoid the problem of reduced learning rate of the mean square error loss function, because the learning rate can be controlled by the output error.
Consider p(i) as the real probability distribution and q(i) as the predicted probability distribution. If we use cross entropy as the loss function, when we minimize it, we can make q(i) gradually approach p( i), the purpose of fitting is achieved.
,
Regression problem with target [0, 1] interval, and generation
customize
Take a fancy to a certain attribute
Take out certain predicted values individually or assign parameters of different sizes
Merge multiple losses
Multi-objective training tasks, setting reasonable loss combination methods (various operations)
neural network fusion
Different neural network losses are combined, and the common loss is used to train and guide the network.
learning rate
The larger the value, the faster the convergence speed.
Small numerical value, high convergence accuracy
How to choose an appropriate learning rate
Fixed
Fixed, that is, fixed learning rate, is the simplest configuration and requires only one parameter.
The learning rate remains unchanged during the entire optimization process. This is a very rarely used strategy, because as it approaches the global optimal point, the learning rate should become smaller and smaller to avoid skipping the optimal point.
step
Use a uniform reduction method, for example, each reduction is 0.1 times the original value.
This is a very commonly used learning rate iteration strategy. Each time the learning rate is reduced to a certain multiple of the original, it is a discontinuous transformation. It is simple to use and usually has good results.
Adagrad
adaptive learning rate
It can be seen from the AdaGrad algorithm that as the algorithm continues to iterate, r will become larger and larger, and the overall learning rate will become smaller and smaller. Therefore, generally speaking, the AdaGrad algorithm starts with incentive convergence, and then slowly turns into penalty convergence, and the speed becomes slower and slower.
RMSprop
The RMSProp algorithm does not violently and directly accumulate square gradients like the AdaGrad algorithm, but adds an attenuation coefficient to control how much historical information is obtained.
To put it simply, after setting the global learning rate, for each pass, the global learning rate is divided parameter by parameter by the square root of the square sum of the historical gradients controlled by the attenuation coefficient, so that the learning rate of each parameter is different.
The effect is that greater progress will be made in the flatter direction of the parameter space (because it is flatter, the sum of the squares of the historical gradients is smaller, corresponding to a smaller learning decline), and it can make the steep direction smoother. , thereby speeding up training
momentum
Go along the optimization direction that has been obtained. There is no need to re-find the direction, just fine-tuning.
What is the difference between using momentum and directly increasing the learning rate?
The direction is different and the search is more accurate.
overfitting
Over-fitting is also called over-learning. Its intuitive manifestation is that the algorithm performs well on the training set, but does not perform well on the test set, resulting in poor generalization performance.
Overfitting is caused by the fact that the training data contains sampling errors during the model parameter fitting process, and the complex model also fits the sampling errors during training. The so-called sampling error refers to the deviation between the sample set obtained by sampling and the overall data set.
The model itself is so complex that it fits the noise in the training sample set. At this time, you need to choose a simpler model or crop the model
The training samples are too few or lack representativeness. At this time, it is necessary to increase the number of samples or increase the diversity of samples
The interference of training sample noise causes the model to fit these noises. In this case, it is necessary to eliminate the noisy data or switch to a model that is not sensitive to noise.
solution
Dropout
The difference between Dropout and Pooling
subtopic
During forward propagation, we let the activation value of a certain neuron stop working with a certain probability p, which can make the model more generalizable because it will not rely too much on certain local features.
Regularization
What effect does Regularization have on the parameter w?
What is weight decay, and how is it related to Regularization?
The purpose of L2 regularization is to attenuate the weight to a smaller value and reduce the problem of model overfitting to a certain extent, so weight attenuation is also called L2 regularization.
Fine-tuning
Most parameters do not need to be updated, and the actual parameters are greatly reduced.
Freeze part of the convolutional layers of the pre-trained model (usually the majority of the convolutional layers close to the input, since these layers retain a lot of underlying information) or even freeze any network layers, and train the remaining convolutional layers (usually the parts close to the output convolutional layer) and fully connected layer.
The principle of fine-tuning is to use the known network structure and known network parameters, modify the output layer to our own layer, and fine-tune the parameters of several layers before the last layer, thus effectively utilizing the powerful generalization capabilities of deep neural networks. fine tuning capabilities, and eliminates the need to design complex models and time-consuming training, so fine tuning is a more suitable choice when the amount of data is insufficient.
significance
Stand on the shoulders of giants: There is a high probability that the model trained by predecessors will be stronger than the model you build from scratch. There is no need to reinvent the wheel.
The training cost can be very low: If you use the method of deriving feature vectors for transfer learning, the later training cost is very low, there is no pressure on the CPU, and it can be done without a deep learning machine.
Suitable for small data sets: For situations where the data set itself is small (thousands of images), it is unrealistic to train a large neural network with tens of millions of parameters from scratch, because the larger the model, the greater the data volume requirements. , overfitting cannot be avoided. At this time, if you still want to use the super feature extraction capabilities of large neural networks, you can only rely on transfer learning.
migration model
Transfer learning (Transfer learning), as the name suggests, is to transfer the parameters of a trained model (pre-trained model) to a new model to help the new model train. Considering that most data or tasks are related, through transfer learning we can share the learned model parameters (which can also be understood as the knowledge learned by the model) to the new model in some way to speed up the process. Optimizing the learning efficiency of the model does not require learning from scratch like most networks.