Principal component analysis and decentralized SPSS
Publish: 2021-04-18 00:36:13
1. First, x1-x12 as index name in transpose arrangement, namely behavior index name, as a numerical value. Then open the software, import the data, and click analysis - & gt; Data rection - & gt; Factor analysis, enter the factor analysis window, select all variables to add to the right border, and click description - & gt; Correlation matrix -- check coefficient and kmo. Click to return to the factor analysis window. Select rotation, check none, and then press OK. General software will be standardized, you should not deal with it yourself
this step is only a rough one, because different versions of SPSS will have different interfaces, as well as Chinese and English versions, so you may need to translate the language of the software. There is only Chinese version around now, sorry~~
this step is only a rough one, because different versions of SPSS will have different interfaces, as well as Chinese and English versions, so you may need to translate the language of the software. There is only Chinese version around now, sorry~~
2. I don't do SPSS. Can I use mathematical modeling
in fact, the principle is the same
there is something wrong with the data you gave, and there are many places where there are more messy data. And the image is not easy to data
I say process:
1. Data standardization processing
each column of data is standardized by its own formula: (xi-u) / d
(Xi is the ith data, u is the average of the column of data, D is the standard deviation)
2. Correlation judgment, list covariance matrix (also known as symmetric matrix)
I use MATLAB software to calculate r = cov (x)
use software to calculate eigenvalue eigenvector of matrix
[V, D] = eig (R)
get D1, D2, D3... di
3 Determine the principal component and calculate the weight I = di / (D1 + D2 + D3 +... + DI)
calculate the target value = weight 1 * X1 + weight 2 * x2 +... + weight I * Xi
in fact, the principle is the same
there is something wrong with the data you gave, and there are many places where there are more messy data. And the image is not easy to data
I say process:
1. Data standardization processing
each column of data is standardized by its own formula: (xi-u) / d
(Xi is the ith data, u is the average of the column of data, D is the standard deviation)
2. Correlation judgment, list covariance matrix (also known as symmetric matrix)
I use MATLAB software to calculate r = cov (x)
use software to calculate eigenvalue eigenvector of matrix
[V, D] = eig (R)
get D1, D2, D3... di
3 Determine the principal component and calculate the weight I = di / (D1 + D2 + D3 +... + DI)
calculate the target value = weight 1 * X1 + weight 2 * x2 +... + weight I * Xi
3. Factor analysis
1 input data
click the analyze drop-down menu and select factor under data recovery
3 after opening factor analysis, select the data variables one by one to enter the variables dialog box
4 Click the describe button in the main dialog box to open the factor analysis: descriptions sub dialog box, select the uniform descriptions item in the statistics column to output the mean value and standard deviation of the variables, select the coefficients item in the correlation matrix column to calculate the correlation coefficient matrix, and click the continue button to return to the main factor analysis dialog box
5 click the extraction button in the main dialog box to open the factor analysis: extraction sub dialog box as shown in the figure below. In the method list, select the default factor extraction method principal components. In the analyze column, select the default correlation matrix item. It is required to solve the principal components from the correlation coefficient matrix. In the exact column, select number of factors; 6. It is required to display the scores of all principal components and the variance that can be explained. Click the Continue button to return to the main factor analysis dialog box
6 click OK in the main dialog box to output the result
the studio of graate students majoring in statistics is original, please don't paste complex
1 input data
click the analyze drop-down menu and select factor under data recovery
3 after opening factor analysis, select the data variables one by one to enter the variables dialog box
4 Click the describe button in the main dialog box to open the factor analysis: descriptions sub dialog box, select the uniform descriptions item in the statistics column to output the mean value and standard deviation of the variables, select the coefficients item in the correlation matrix column to calculate the correlation coefficient matrix, and click the continue button to return to the main factor analysis dialog box
5 click the extraction button in the main dialog box to open the factor analysis: extraction sub dialog box as shown in the figure below. In the method list, select the default factor extraction method principal components. In the analyze column, select the default correlation matrix item. It is required to solve the principal components from the correlation coefficient matrix. In the exact column, select number of factors; 6. It is required to display the scores of all principal components and the variance that can be explained. Click the Continue button to return to the main factor analysis dialog box
6 click OK in the main dialog box to output the result
the studio of graate students majoring in statistics is original, please don't paste complex
4. Needless to say, the purpose of data standardization is to unify the units of variables (it is not suitable to directly carry out statistical analysis between variables with different units. Standardization makes the units of all variables unified as SD). When we use SPSS for principal component analysis, we use the correlation matrix of variables by default for calculation, and the correlation coefficient is originally a standardized statistic, That is to say, the process of principal component analysis has included the process of standardization, so it is not necessary to do a special standardization for the data.
5. SPSS is more convenient for clustering and principal component analysis, but modeler is better for big data
6. The kmo test statistic is above 0.7, which indicates that the partial correlation between variables is strong and suitable for factor analysis. The spherical test p is less than 0.001, which indicates that there is correlation between variables. The second table is commonality, which indicates the degree of common factors that the original information contained in each variable can be extracted. According to your data, you can extract two common factors. The third table refers to the comparison column that the extracted two principal components can explain the difference. The fourth table is the principal component expression, and the fifth table is the factor score formula.
7. 1 input data
click the analyze drop-down menu and select factor under data recovery
3 after opening factor analysis, select the data variables one by one to enter the variables dialog box
4 Click the describe button in the main dialog box to open the factor analysis: descriptions sub dialog box, select the uniform descriptions item in the statistics column to output the mean value and standard deviation of the variables, select the coefficients item in the correlation matrix column to calculate the correlation coefficient matrix, and click the continue button to return to the main factor analysis dialog box
5 click the extraction button in the main dialog box to open the factor analysis: extraction sub dialog box as shown in the figure below. In the method list, select the default factor extraction method principal components. In the analyze column, select the default correlation matrix item. It is required to solve the principal components from the correlation coefficient matrix. In the exact column, select number of factors; 6. It is required to display the scores of all principal components and the variance that can be explained. Click the Continue button to return to the main factor analysis dialog box
6 click OK in the main dialog box to output the result
the studio of graate students majoring in statistics is original, please don't paste complex
click the analyze drop-down menu and select factor under data recovery
3 after opening factor analysis, select the data variables one by one to enter the variables dialog box
4 Click the describe button in the main dialog box to open the factor analysis: descriptions sub dialog box, select the uniform descriptions item in the statistics column to output the mean value and standard deviation of the variables, select the coefficients item in the correlation matrix column to calculate the correlation coefficient matrix, and click the continue button to return to the main factor analysis dialog box
5 click the extraction button in the main dialog box to open the factor analysis: extraction sub dialog box as shown in the figure below. In the method list, select the default factor extraction method principal components. In the analyze column, select the default correlation matrix item. It is required to solve the principal components from the correlation coefficient matrix. In the exact column, select number of factors; 6. It is required to display the scores of all principal components and the variance that can be explained. Click the Continue button to return to the main factor analysis dialog box
6 click OK in the main dialog box to output the result
the studio of graate students majoring in statistics is original, please don't paste complex
8. Principal component analysis (PCA) is used to calculate the scores of several typical major components in many indicators in advance, one of which is regression method
while regression analysis is used to build a relationship model between independent variables and dependent variables, So we can find an effective way to predict the dependent variables
so regression analysis needs to have clear independent variables and dependent variables
while principal component analysis does not have the so-called independent variables and dependent variables
while regression analysis is used to build a relationship model between independent variables and dependent variables, So we can find an effective way to predict the dependent variables
so regression analysis needs to have clear independent variables and dependent variables
while principal component analysis does not have the so-called independent variables and dependent variables
9. Tools / raw materials
spss20.0
methods / steps
first prepare the data to be processed in SPSS, and then execute: analyze -- dimension Recommendation -- factor analysis on the menu bar. Open the factor analysis dialog box
we can see that the following figure is the factor analysis dialog box, put all the variables to be analyzed into the variables window
click the descriptions button to enter the secondary dialog box, which can output the description statistics we want to see
because we need to look at the correlation between the variables for principal component analysis, To have an understanding of the relationship between variables, you need to output correlation. Check the coefficient, click continue, return to the main dialog box
return to the main dialog box, click OK, and start to output the data processing results
the first table you see is the correlation matrix. The reality is the correlation coefficient between variables, You can see the correlation between the variables, and then understand the relationship between the variables
the second table shows the process of principal component analysis. We can see the total column under eigenvalues, which means the characteristic root. Its meaning is the indicator of the influence degree of the principal component. Generally, it takes 1 as the standard. If the characteristic root is less than 1, It shows that the influence of this main factor is not as strong as a basic variable. So we only extract the principal components whose feature roots are greater than 1. As shown in the figure, the first three principal components are greater than 1, so we can only say that there are three principal components. In addition, we can see that the first principal component variance accounts for 46.9% of all principal component variance, the second one accounts for 27.5%, and the third one accounts for 15.0%. The total amount of these three items reached 89.5%.
spss20.0
methods / steps
first prepare the data to be processed in SPSS, and then execute: analyze -- dimension Recommendation -- factor analysis on the menu bar. Open the factor analysis dialog box
we can see that the following figure is the factor analysis dialog box, put all the variables to be analyzed into the variables window
click the descriptions button to enter the secondary dialog box, which can output the description statistics we want to see
because we need to look at the correlation between the variables for principal component analysis, To have an understanding of the relationship between variables, you need to output correlation. Check the coefficient, click continue, return to the main dialog box
return to the main dialog box, click OK, and start to output the data processing results
the first table you see is the correlation matrix. The reality is the correlation coefficient between variables, You can see the correlation between the variables, and then understand the relationship between the variables
the second table shows the process of principal component analysis. We can see the total column under eigenvalues, which means the characteristic root. Its meaning is the indicator of the influence degree of the principal component. Generally, it takes 1 as the standard. If the characteristic root is less than 1, It shows that the influence of this main factor is not as strong as a basic variable. So we only extract the principal components whose feature roots are greater than 1. As shown in the figure, the first three principal components are greater than 1, so we can only say that there are three principal components. In addition, we can see that the first principal component variance accounts for 46.9% of all principal component variance, the second one accounts for 27.5%, and the third one accounts for 15.0%. The total amount of these three items reached 89.5%.
10.
Principal component analysis (PCA) and factor analysis (FA) are both methods of information concentration, that is, multiple analysis items are condensed into several general indicators. If you want to name indicators, spssau suggests using factor analysis. The reason is that factor analysis has a rotation function on the basis of principal component analysis, and the purpose of rotation is to name
the purpose of principal component analysis is to concentrate information (but not pay much attention to the corresponding relationship between principal components and analysis items), weight calculation, and comprehensive score calculation
At the same time, spssau can directly save factor scores and comprehensive scores, without manual calculation
Hot content
