Position: Home page » Computing » Decentralization of principal component analysis

Decentralization of principal component analysis

Publish: 2021-03-30 04:43:19
1. After the interview in recent days, I found that data dimensionality rection is widely used or indispensable in the instry. Therefore, this aspect needs to be focused on. Today, I will summarize the data dimensionality rection, including other people's achievements. Here I would like to thank them for their contents

method
has multiple perspectives on the role of data dimensionality rection. Wu Enda said in his video that dimensionality rection is used for data compression to rece noise and prevent slow running and small memory; When it is reced to 2 or 3 dimensions, it can be visualized for data analysis; Don't use dimension rection to prevent over fitting. It's easy to remove important features related to tags. But why data need to be compressed, in addition to occupying memory, is there any other reason - "dimension disaster" problem: the higher the dimension, the more sparse the distribution of your data on each feature dimension, which is basically disastrous for machine learning algorithms. The final result may be that each sample has its own characteristics, which can not form a unified feature to distinguish positive cases from negative cases. There is another case, when the feature is more than the sample size, some classification algorithms (SVM) are invalid, which is related to the principle of classification algorithm<

data dimension rection method:
& 65532<

linear dimensionality rection method:
principal component analysis (PCA) and discriminant analysis (LDA)
understanding of PCA:
1. PCA can be understood as the projection of high-dimensional data to low-dimensional data, and make the projection error minimum. It's an unsupervised method
2. It can also be understood as coordinate rotation and translation (corresponding to coordinate transformation and decentralization), so that the n-dimensional space can be analyzed in n-1 dimension, and the characteristics of small variance (small variance, small uncertainty, small amount of information)
3. Derivation of PCA
4. Connection between PCA and SVD
(Understanding PCA from the perspective of matrix decomposition)
5. Application of PCA dimension rection
6 Disadvantages of PCA:
(1) PCA is a linear dimensionality rection method, sometimes the nonlinear relationship between data is very important, when we use PCA, we will get very poor results. Next, we introce PCA of kernel method
(2) principal component analysis is more effective only when the sample points obey Gaussian distribution
(3) cost sensitive PCA (cspca) can be used to rece the dimension of imbalanced data.
(4) the size of feature roots determines how much information we are interested in. In other words, small feature roots often represent noise, but in fact, the projection to smaller feature roots may also include the data we are interested in
(5) the directions of eigenvectors are orthogonal, which makes PCA vulnerable to outlier
(6) it is difficult to explain the results. For example, in the establishment of linear regression model (linear regression model) analysis of dependent variables
2.

bus line: Metro Line 2 → Metro Line 1, the whole journey is about 10.8km

1. Walk about 670m from Changsha meixihu international culture and Art Center to meixihu east station

2. Take Metro Line 2, pass 7 stops, reach Wuyi Square Station

3. Take Metro Line 1, pass 1 stop, reach peiyuanqiao station

4, walk about 1.2km, Reach POFU International Plaza


bus line: Metro Line 2 → 358, the whole journey is about 11.4km

1. Walk about 670m from Changsha meixihu international culture and Art Center to meixihu east station

2. Take Metro Line 2, pass 7 stops, and reach Wuyi Square Station

3, walk about 360m, Arrive at Huatu Ecation (taipingjiekou) station

4, take bus 358, pass 4 stops, arrive at provincial women and children station

5, walk about 200 meters to POFU International Plaza

3. Look what level you're going to be... If you are a novice, you should first use the lowest level of digging, which is also the same. You go to the tool vendor to buy a stone mine hoe, and then dig by yourself. When you dig to a star on the mine hoe, you can upgrade Every upgraded tool can be upgraded only when it reaches the star level; Finished proct: Excavator T800; Raw materials: Stone 4, iron 2, hemp rope 1, stone hoe 1
4. It depends on the personal situation. It's OK to dig by yourself. The cost of cement depends on the situation and the indivial decides whether or not to make porcelain tiles. The landscaping is not mentioned. The foundation of a single pool is digging and cement. Remember to figure out how to get out of the water and change the water... My family's is a single pool and hot cement. Open a hole at the bottom to drain water (usually plug it). Open a small hole in front of the top to prevent overflow. Add an oxygen pump
5. The disadvantages of principal component analysis are as follows:
1. In principal component analysis, we should first ensure that the cumulative contribution rate of the first several principal components extracted reaches a higher level (that is, the amount of information after variable dimensionality rection must be kept at a higher level), Secondly, the extracted principal components must be able to give explanations in line with the actual background and meaning (otherwise, the principal components will have no information and no actual meaning)
2. The meaning of principal component is generally a little fuzzy, which is not as clear and exact as the meaning of the original variable. This is the price we have to pay in the process of variable dimension rection. Therefore, the number of extracted principal components m should be significantly less than the number of original variables p (unless P itself is small), otherwise the "advantage" of dimension rection may not offset the "disadvantage" of principal component meaning is not as clear as the original variables.
6.

1、 Different ways:

1, principal component analysis:

through orthogonal transformation, a group of variables that may have correlation are converted into a group of linearly unrelated variables, and the converted variables are called principal components

2, factor analysis:

by extracting common factors from variable groups, factor analysis can find out the hidden representative factors in many variables

3. Correspondence analysis:

reveals variables by analyzing the interactive summary table composed of qualitative variables

Principal component analysis: principal component analysis, as a basic mathematical analysis method, has a wide range of practical applications, such as demography, quantitative geography, molecular dynamics simulation, mathematical modeling, mathematical analysis and other disciplines

2, factor analysis:

factor analysis has a wide range of applications in market research, including consumer habits and attitudes, brand image and characteristics, service quality, personality test

3. Correspondence analysis:

can draw many samples and many variables on the same diagram at the same time, and show the categories and attributes of samples intuitively and clearly. In addition, it also omits the complex mathematical operation and intermediate process such as factor selection and factor axis rotation, and can visually classify the samples from the factor load diagram. It is an intuitive, simple and convenient multivariate statistical method

Principal component analysis (PCA) for all the previously proposed variables, the rendant repetitive variables (closely related variables) are deleted, and as few new variables as possible are established, so that these new variables are not related to each other, and these new variables keep the original information as far as possible in reflecting the information of the subject

correspondence analysis was proposed by French benzenci in 1970. It was most popular in France and Japan at first, and then introced to the United States. Correspondence analysis is a multivariate statistical analysis method based on R-type and Q-type factor analysis, so correspondence analysis is also called R-Q factor analysis

in factor analysis, if the research object is sample, Q-type factor analysis is needed; If the research object is variable, R-type factor analysis is needed. However, these two analytical methods are often opposite to each other, and the samples and variables must be treated separately

7. The basic idea of principal component analysis

principal component analysis is to use the idea of dimension rection to transform multiple variables into a few comprehensive variables (i.e. principal components). Each principal component is a linear combination of the original variables, and each principal component is not related to each other, so these principal components can reflect most of the information of the initial variables, This method can overcome the shortcomings that a single financial index can not truly reflect the financial situation of the company, introce various financial indicators, but also attribute the complex factors to several principal components, so that the complex problems can be simplified and more scientific and accurate financial information can be obtained

I'm also learning the specific practical operation, mainly in the laboratory analysis, with Minitab

there are a lot of information on this aspect on the Internet, you can read it in detail by yourself

hope it will be useful to you
8. sure! Remove the unimportant factor, the main factor into a pie chart on the line!
Hot content
Inn digger Publish: 2021-05-29 20:04:36 Views: 341
Purchase of virtual currency in trust contract dispute Publish: 2021-05-29 20:04:33 Views: 942
Blockchain trust machine Publish: 2021-05-29 20:04:26 Views: 720
Brief introduction of ant mine Publish: 2021-05-29 20:04:25 Views: 848
Will digital currency open in November Publish: 2021-05-29 19:56:16 Views: 861
Global digital currency asset exchange Publish: 2021-05-29 19:54:29 Views: 603
Mining chip machine S11 Publish: 2021-05-29 19:54:26 Views: 945
Ethereum algorithm Sha3 Publish: 2021-05-29 19:52:40 Views: 643
Talking about blockchain is not reliable Publish: 2021-05-29 19:52:26 Views: 754
Mining machine node query Publish: 2021-05-29 19:36:37 Views: 750