Calculation force of conventional neural network
fuzzy PID is troublesome. Open the FIS mole in MATLAB, and generally use the second-order fuzzy PID? The membership functions of input E and EC are generally Gaussian, and the output fuzzy KP, Ki and KD are generally triangular. We also need to set the fuzzy rules and load them into Simulink. Adjust the fuzzy factors Gu, Ge, GEC, and set the parameters of fuzzy PID
in a word, it's hard to explain your problem clearly in whiteness knowledge.
belong to neural network. Genetic algorithms belong to evolutionary algorithms
neural network can simulate the neural calculation process of human brain, which can achieve highly nonlinear prediction and calculation. It is mainly used for nonlinear fitting and recognition. Its characteristic is that it needs "training" to give some inputs and tell him the correct output. After several times, the new input is given, and the neural network can correctly predict the output of the target. Neural network is widely used in pattern recognition and fault diagnosis. BP algorithm and BP neural network are the improved version of neural network, which corrects some shortcomings of neural network
genetic algorithm belongs to evolutionary algorithm, which simulates the process of biological evolution in nature: survival and survival. With the continuous evolution of indivials, only high-quality indivials (the minimum (large) objective function) can enter the next generation of reproction. Finally, the global optimal value is found. Genetic algorithm can well solve the highly nonlinear optimization problems which can not be solved by conventional optimization algorithms, and is widely used in all walks of life. Differential evolution, ant colony algorithm and particle swarm optimization are all evolutionary algorithms, but the simulated biological population objects are different.
The weights of neural network are obtained by training the network. If you use MATLAB, do not set it by yourself. After newff, the value will be assigned automatically. You can also manually: net. IW {} =; net.bias{}= Generally speaking, input normalization, then W and B are random numbers of 0-1. The purpose of determining the weights of neural networks is to let neural networks learn useful information in the training process, which means that the parameter gradient should not be 0
There are two necessary conditions for parameter initialization:1. Each activation layer will not be saturated. For example, for sigmoid activation function, the initialization value should not be too large or too small, leading to falling into its saturation region
2. Each activation value is not 0. If the output of the activation layer is zero, that is, the input of the next convolution layer is zero, so the partial derivative of the convolution layer to the weight is zero, resulting in the gradient of 0
extended data:
the relationship between neural networks and weights
when training agents to perform tasks, they will choose a typical neural network framework and believe that it has the potential to code specific strategies for this task. Note that there is only potential here, and we need to learn the weight parameter to change this potential into ability
inspired by the early behavior and innate ability of nature, in this work, researchers build a neural network that can naturally perform a given task. In other words, find a congenital neural network architecture, and then only need to randomly initialize the weights to perform the task. Researchers say that this neural network architecture without learning parameters has good performance in reinforcement learning and supervised learning
in fact, if we imagine that the neural network architecture provides a cycle, then the conventional learning weight is to find the best (or optimal parameter solution). But for neural networks without learning weights, it is equivalent to introcing a very strong inctive bias, so that the whole architecture bias can directly solve a problem
but for neural networks without learning weights, it is equivalent to constantly specialization architecture, or recing model variance. In this way, when the architecture is smaller and smaller and only contains the optimal solution, the randomized weight can also solve the practical problem. As researchers do, it is also feasible to search from small architecture to large architecture, as long as the architecture can just surround the optimal solution
1. The basic idea of BP algorithm is that the learning process consists of two processes: signal forward propagation and error back propagation
1) forward propagation: input sample - & gt; Input layer - & gt; Each hidden layer (processing) - & gt; Output layer
note 1: if the actual output of the output layer does not match the expected output (teacher signal), then go to 2) (error back propagation process)
2) error back propagation: output error (some form) - & gt; Hidden layer (layer by layer) - & gt; The main purpose of input layer
is to spread the output error back to all units of each layer, so as to obtain the error signal of each layer unit, and then correct the weight of each unit (the process is a process of weight adjustment)
note 2: the process of weight adjustment is the process of network learning and training (learning is the origin, weight adjustment)
2. BP algorithm implementation steps (software):
1) initialization
2) input training sample pair, calculate the output of each layer
3) calculate the network output error
4) calculate the error signal of each layer
5) adjust the weight of each layer
6) check whether the total error of the network meets the accuracy requirements
then the training is over; If not, return to step 2)
3. Main capabilities of multilayer perceptron (based on BP algorithm):
1) nonlinear mapping: enough samples - & gt; Learning training
can learn and store a large number of input-output pattern mapping relationships. As long as enough sample patterns are provided for BP network to learn and train, it can complete the nonlinear mapping from n-dimensional input space to m-dimensional output space
2) generalization: input new samples (not available ring training) - & gt; Complete the correct input and output mapping
3) fault tolerance: indivial sample error can not affect the adjustment of the weight matrix
4. Defects of the standard BP algorithm:
1) it is easy to form a local minimum (greedy algorithm, local optimal) but not a global optimal
2) many training times make the learning efficiency low and the convergence speed slow (need to do a lot of calculation)
3) the selection of hidden nodes lacks theoretical support
4) learning new samples tends to forget the old ones
note 3: improved algorithm - adding momentum term, adaptively adjusting learning rate (this seems good) and introcing steepness factor
Artificial neural network has many models, but the most widely used, the most intuitive basic idea, the most easy to understand is the multilayer feedforward neural network and error back-propagation learning algorithm, referred to as BP network
in the book parallel distributed processing published by scientists headed by Rumelhart and McCelland in 1986, a complete error back propagation learning algorithm was proposed and widely accepted. Multilayer perceptual network is a hierarchical neural network with three or more layers. A typical multi-layer perceptual network is a three-layer, feedforward hierarchical network (Figure 4.1), namely: input layer, hidden layer (also known as middle layer), output layer, as follows:
figure 4.1 three-layer BP network structure
(1) input layer
input layer is the interface between the network and external interaction. Generally, the input layer is only the storage layer of the input vector, and it does not process the input vector. The number of neurons in the input layer can be determined according to the problem to be solved and the way of data representation. Generally speaking, if the input vector is an image, the number of neurons in the input layer can be the number of pixels of the image or the number of features of the image after processing
(2) hidden layer in 1989, Robert Hecht nielsno proved that any continuous function in a closed interval can be approximated by a hidden layer BP network, so a three-layer BP network can complete any n-dimensional to m-dimensional mapping. Although increasing the number of hidden layers can further rece the error and improve the accuracy, it also complicates the network and increases the training time of network weights. The improvement of error precision can also be achieved by increasing the number of neurons in the hidden layer, and the training effect is easier to observe and adjust than increasing the number of hidden layers. Therefore, in general, priority should be given to increasing the number of neurons in the hidden layer, and then the appropriate number of hidden layers should be selected according to the specific situation(3) output layer
the output layer outputs the result vector of network training, and the dimension of the output vector should be designed according to the specific application requirements. In the design, the scale of the system should be reced as much as possible to rece the complexity of the system. If the network is used as a recognizer, the output of the identified class neuron is close to 1, while the output of other neurons is close to 0
all the neurons between the adjacent layers of the above three layers are fully connected, that is, each neuron in the next layer is fully connected with each neuron in the upper layer, and there is no connection between the neurons in each layer. The connection strength constitutes the weight matrix w of the network
BP network is a way of learning with teachers. First, the teacher sets an expected output value for each input mode. Then the actual learning and memory mode is input into the network and propagated from the input layer through the middle layer to the output layer (called "mode forward propagation"). The difference between the actual output and the expected output is the error. According to the rule of minimum square error, the connection weights are modified layer by layer from the output layer to the middle layer, which is called "error back propagation" (Chen Zhengchang, 2005). So back propagation neural network is also referred to as BP (back propagation) network. With the alternation and repetition of "mode forward propagation" and "error backward propagation" processes. The actual output of the network graally approaches to the corresponding expected output, and the accuracy of the network response to the input pattern is also rising. Through this learning process, after determining the connection weight of each layer. The learning and program running process of typical three-layer BP neural network is as follows (Xiangyuan, 2006):
(1) firstly, the form and significance of each symbol are explained:
network input vector p < sub > k < / sub > = (a < sub > 1 < / sub >, a < sub > 2 < / sub >,..., a < sub > n < / sub >)
network target vector t < sub > k < / sub > = (y < sub > 1 < / sub >, y < sub > 2 < / sub >,..., y < sub > n < / sub >)
input vector s < sub > k < / sub > = (s < sub > 1 < / sub >, s < sub > 2 < / sub >,..., s < sub > P < / sub >), output vector b < sub > k < / sub > = (b < sub > 1 < / sub >, B < sub > 2 < / sub >,..., B < sub > P < / sub >)
Input vector l < sub > k < / sub > = (L < sub > 1 < / sub >, l < sub > 2 < / sub >,..., l < sub > Q < / sub >), output vector C < sub > k < / sub > = (C < sub > 1 < / sub >, C < sub > 2 < / sub >,..., C < sub > Q < / sub >)connection weight from input layer to middle layer w < sub > ij < / sub >, I = 1, 2,..., N, j = 1, 2,... P
The connection weight from the middle layer to the output layer V < sub > JT < / sub >, j = 1, 2,..., P, t = 1, 2,..., pthe output threshold of each cell in the middle layer θ< sub>j,j=1,2,...,p
the output threshold of each unit in the output layer γ< sub>j,j=1,2,...,p
parameter k = 1, 2,..., M
(2) initialization. For each connection, the weights w < sub > ij < / sub >, V < sub > JT < / sub >, threshold value are given θ< Sub > J < / sub > and γ< Sub > J < / sub > gives random values in the interval (- 1,1)
(3) randomly select a group of input and target samples
to provide to the network
(4) input sample, connection weight W < sub > ij < / sub > and threshold were used θ< Sub > J < / sub > calculates the input s < sub > J < / sub > of each unit in the middle layer, and then uses s < sub > J < / sub > to calculate the output B < sub > J < / sub > of each unit in the middle layer through the transfer function
environmental effect and evaluation method of foundation pit dewatering engineering
b < sub > J < / sub > = f (s < sub > J < / sub >) J = 1, 2,..., P (4.5)
(5) using output B < sub > J < / sub >, connection weight V < sub > JT < / sub > and threshold of middle layer γ< Sub > T < / sub > calculates the output l < sub > T < / sub > of each unit in the output layer, and then calculates the response C < sub > T < / sub > of each unit in the output layer through the transfer function
environmental effect and evaluation method of foundation pit dewatering engineering
C < sub > T < / sub > = f (L < sub > T < / sub >) t = 1, 2,..., q (4.7)
(6) using the network target vector
, the actual output of the network C < sub > T < / sub >, the generalized error of each unit in the output layer is calculated (7) using the connection weight V < sub > JT < / sub >, the generalized error of output layer d < sub > T < / sub > and the output of middle layer B < sub > J < / sub > to calculate the generalized error of each unit in the middle layer
environmental effect and evaluation method of foundation pit dewatering project (8) the connection weight V < sub > JT < / sub > and threshold value are corrected by using the generalized error
of each unit in the output layer and the output B < sub > J < / sub > of each unit in the middle layer γ< sub>t
environmental effect and evaluation method of foundation pit dewatering engineering
(9) the connection weight W < sub > ij < / sub > and threshold are corrected by using the generalized error of each unit in the middle layer, the input P < sub > k < / sub > = (a < sub > 1 < / sub >, a < sub > 2 < / sub >,..., a < sub > n < / sub >) θ< sub>j
environmental effect and evaluation method of foundation pit dewatering project
(10) randomly select the next learning sample vector to provide to the network, return to step (3), until m training samples are trained
(11) randomly select a group of input and target samples from m learning samples, and return to step (3) until the global error e of the network is less than a preset minimum, that is, the network converges. If the number of learning times is greater than the preset value, the network can not converge
(12) the end of learning
It can be seen that in the above learning steps, (8) and (9) are the "back propagation process" of network error, and (10) and (11) are used to complete the training and convergence processgenerally, the trained network should also be tested. The test method is to select the test sample vector and provide it to the network to test the correctness of its classification. The test sample vector should include the main typical patterns that may be encountered in the future network application process (song Daqi, 2006). These samples can be directly measured, or obtained by simulation. When the sample data is small or difficult to obtain, it can also be obtained by adding appropriate noise to the learning samples or interpolating according to certain rules. In order to better verify the generalization ability of the network, a good test sample set should not contain the same patterns as the learning samples (Dong Jun, 2007)
Artificial Neural Networks (ANNs) is also referred to as neural network (NNs) or connection model. It is an algorithmic mathematical model that simulates the behavior characteristics of animal neural network and carries out distributed parallel information processing. This kind of network depends on the complexity of the system, by adjusting the relationship between a large number of internal nodes, so as to achieve the purpose of processing information
BP neural network is the most commonly used, and SVM is also very commonly used in data mining<
[fuzzy]
fuzzy logic refers to imitating the human brain's uncertain concept judgment and reasoning thinking mode. For the description system with unknown or uncertain model, as well as the control object with strong nonlinearity and large delay, fuzzy sets and fuzzy rules are applied to reasoning to express the transitional boundary or qualitative knowledge and experience, simulate the human brain mode and implement fuzzy comprehensive judgment, Reasoning solves the problem of rule-based fuzzy information which is difficult to deal with by conventional methods. Fuzzy logic is good at expressing the qualitative knowledge and experience with unclear boundaries. With the help of the concept of membership function, it distinguishes fuzzy sets, processes fuzzy relations, simulates human brain to implement rule-based reasoning, and solves various uncertain problems caused by the logic breaking of "law of excluded middle"
rough set (also known as rough set) theory was put forward by 2. Pawlak in 1982! Incomplete information provides a new mathematical tool. Rough set theory is based on classification mechanism, which regards classification as equivalence relation in a specific space, and equivalence relation constitutes the partition of the space. In this theory, knowledge is understood as the division of data, and the set of each division is called concept. The main idea of rough set theory is to use the known knowledge base to approximate the imprecise or uncertain knowledge with the existing knowledge in the knowledge base on the premise of keeping the classification ability of information system unchanged! Rection, deriving the decision or classification rules of the problem
the most significant difference between rough set theory and other theories dealing with uncertain and imprecise problems is that rough set theory does not need to provide any prior information beyond the data set that the problem needs to deal with, and the description or treatment of the uncertainty of the problem is more objective. Because this theory does not contain the mechanism of dealing with imprecise or imprecise original data, so this theory is different from probability theory! Fuzzy mathematics! Evidence theory and other theories dealing with imprecise or uncertain problems are highly complementary. Rough set theory not only provides new research methods for information science and cognitive science, but also provides effective processing technology for intelligent information processing. At present, rough set theory has been a research hotspot in the field of artificial intelligence, and has become one of the main technologies of data mining application, which has been highly valued by scholars all over the world.
1. More input samples can improve generalization ability
but not too many, too many samples lead to over fitting and poor generalization ability; The sample includes at least one turning point data
2. The number of neurons in the hidden layer should be as small as possible without affecting the performance. There are too many hidden layer nodes, resulting in the decline of generalization ability. There are only dozens to hundreds of neurons to build rockets. Why do we need so many neurons to fit hundreds of data
3. If the error is small, the generalization ability is good; If the error is too small, it will over fit and the generalization ability will be poor
4. The choice of learning rate, especially the weight learning rate, has a great impact on the network performance. If it is too small, the convergence speed is very slow, and it is easy to fall into local minimization; If it is too large, the convergence speed is fast, but it is easy to swing, and the error is difficult to rece; The general weight learning rate is slightly larger than the required error; In addition, the variable learning rate can be used to increase the learning rate when the error is large, and then rece the learning rate when the error is small, so that the convergence is faster, the learning effect is better, and it is not easy to fall into local minimization
5. The training can be terminated at any time when the error meets the requirements, so as to avoid over fitting; The local weights can be adjusted to accelerate the local convergence.
no matter how large it is, it requires high computational power. Even if there are many samples, it will converge in the end. Several years have passed, and it is meaningless unless it is to experiment with a new algorithm.
