Calculation force of P value
1、 P value calculation method
left test p value is the probability that the test statistic is less than or equal to the test statistic value calculated according to the actual observation sample data, that is, P value
the right test p value is when μ=μ When 0, the test statistic is greater than or equal to the probability of the test statistic value calculated according to the actual observation sample data, i.e. p value
the p value of bilateral test is when μ=μ When 0, the test statistic is greater than or equal to the probability of the test statistic value calculated according to the actual observation sample data, i.e. p value
Second, the significance of P value is probability, which reflects the possibility of an event. According to the statistical significance test method, the p value is generally P & lt; 05, P & lt; 0.01 is very significant, which means that the probability of the difference between samples caused by sampling error is less than 0.05 or 0.01
extended data:
data analysis refers to the process of using appropriate statistical analysis methods to analyze a large number of collected data, extract useful information and form conclusions, and make a detailed study and summary of the data. This process is also the supporting process of the quality management system. In practice, data analysis can help people make judgments in order to take appropriate actions
The mathematical basis of data analysis was established in the early 20th century, but it was not until the emergence of computers that practical operation became possible and data analysis was promoted. Data analysis is a combination of mathematics and computer sciencein the field of statistics, some people divide data analysis into descriptive statistical analysis, exploratory data analysis and confirmatory data analysis; Among them, exploratory data analysis focuses on the discovery of new features in the data, while confirmatory data analysis focuses on the confirmation or falsification of existing hypotheses
The calculation formula of P value is
= 2 [1]- Φ( When H1 is assumed to be p, it is not equal to P 0
=1- Φ( Z0) when H1 is assumed to be p greater than P0
= Φ( Z0) when H1 is assumed to be p less than P0
In conclusion, the smaller the p value is, the more significant the result is. But whether the test results are "significant", "moderately significant" or "highly significant" needs to be solved according to the size of P value and practical problems
extended data
the main contents of regression analysis in statistics are as follows:
1. From a group of data, determine the quantitative relationship between some variables, that is, establish a mathematical model and estimate the unknown parameters. The common method to estimate parameters is the least square method
2
In the relationship of many independent variables affecting a dependent variable, it is necessary to judge which (or which) independent variables are significant and which ones are not significant. The significant independent variables are added to the model, and the insignificant ones are eliminated by stepwise regression, forward regression and backward regression The relationship is used to predict or control a proction process. The application of regression analysis is very extensive. Statistical software package makes the calculation of various regression methods very convenientThe value of P is the area or probability of the exclusion domain
The calculation formula ofp value is
= 2 [1]- Φ( When H1 is assumed to be p, it is not equal to P 0
=1- Φ( Z0) when H1 is assumed to be p greater than P0
= Φ( Z0) when H1 is assumed to be p less than P0
In conclusion, the smaller the p value is, the more significant the result is. But whether the test results are "significant", "moderately significant" or "highly significant" needs to be solved according to the size of P value and practical problems The
p value refers to the probability that the statistical summary (such as the difference between the mean values of two groups of samples) is the same as or even greater than the actual observation data in a probability model. In other words, it is to test the possibility that the null hypothesis holds or performs more seriously
If the value ofP is smaller than the selected significance level (0.05 or 0.01), the null hypothesis will be rejected and unacceptable. However, this does not directly show that the original hypothesis is correct. P value is a random variable which obeys normal distribution. In practice, there are uncertainties e to various factors such as samples. The results may be controversial
Calculation of P value:
generally, X is used to represent the test statistic. When H0 is true, the value C of the statistic can be calculated from the sample data. According to the specific distribution of the test statistic x, the p value can be calculated. Specifically, the p value of the left test is the probability that the test statistic x is less than the sample statistic C, that is, P = P {X & lt; C}
the p value of the right test is the probability that the test statistic x is greater than the sample statistic C: P = P {X & gt; C}
the p value of the two-sided test is twice the probability that the test statistic x falls in the tail region of the sample statistic C as the endpoint: P = 2p {X & gt; C} (when C is at the right end of the distribution curve) or P = 2p {X & lt; C} (when C is at the left end of the distribution curve). If x obeys normal distribution and t distribution, its distribution curve is symmetric about the longitudinal axis, so its p value can be expressed as P = P {x} & gt; C}
extended data:
significance of hypothesis test:
hypothesis test is an important part of sampling inference. It is based on the original data to make a general index is equal to a certain value, a random variable is subject to a certain probability distribution hypothesis
According to the principle of probability, we can judge whether there is a significant difference between the estimated value and the total value (or between the estimated distribution and the actual distribution) and whether we should accept a test method selected by the original hypothesis Some conclusions are completely reliable, while others are only reliable to varying degrees, which need to be further tested and confirmedthrough the test, we can judge whether there are differences between the sample indicators and the assumed overall indicators, and whether we accept the original hypothesis. It must be clear here that the purpose of testing is not to doubt whether the sample index itself is calculated correctly, but to analyze whether there is a significant difference between the sample index and the overall index. In this sense, hypothesis test is also called significance test
F, which is the significance test of regression equation. It indicates whether the linear relationship between the explained variable and all the explanatory variables in the model is significant in general. If F & gt; FA (k-1, n-k) rejected the original hypothesis, that is, the combination of explanatory variables included in the model has a significant impact on the explained variables, otherwise, it has no significant impact.
P value is actually a probability value calculated according to the sampling distribution, which is calculated according to the test statistics. By directly comparing the value of P with the given significance level a, we can know whether to reject the hypothesis, which obviously replaces the method of comparing the value of test statistics with the critical value
Moreover, by this method, we can also know the actual probability of making the first type of error when p value is less than a, P = 0.03 & lt; A = 0.05, then the probability of rejecting the hypothesis is 0.03. It should be pointed out that if P & gt; a. So if it's not rejected, in this case, the first kind of error doesn't happen The p value in thet test is the probability of accepting the hypothesis that there is a difference between the two means. For example, if the zero hypothesis is that the mean values of two populations are equal (U1 = U2), but the mean values of the samples calculated from the corresponding two samples are not equal, there is a certain "difference"
if we calculate P & lt; That is to say, if the null hypothesis is correct, that is, the mean values of the two populations are equal, then the probability of procing such a large difference between the mean values of the samples is less than 0.01
that is to say, the reason for such a big difference between the two sample means is random, not because the original mean values of the populations they come from are not equal. The probability of such a difference is & lt; 0.01
extended data
the role of P value:
p value can be used to make hypothesis testing decisions. If p value is smaller than significance level a, the value of test statistics is in the rejection domain. Similarly, if P is greater than or equal to significance level a, the value of the test statistic will no longer be rejected in the domain. In the above example of coffee problem, P value is 0.0038, which is less than the significance level a = 0.01, indicating that the original hypothesis should be rejected
pairwise comparison between multiple sample means is called multiple comparison. If multiple comparison is made by t test of two sample means comparison, the probability of making type I error will be increased
for example, if there are 4 samples, the pairwise combination number is (24) = 6, and the test level of each comparison is a = 0.05, then the probability of not making type I error for each comparison is (1-0.05) 6, and the probability of not making type I error for each comparison is (1-0.05) 6, which is the total test level. It is 1 - (1-0.05) 6 = 0.26, much higher than 0.05
Therefore, many statisticians conclude that t-test is not suitable for multiple comparisons. The so-called key reason why t-test can not be carried out is that the probability of getting all the tests correct will decrease as the number of tests increases, that is, the probability of making type I errors will increase, rather than the defect of t-test itselfif we do a data analysis of a new drug clinical trial and carry out n trials in the whole analysis process, then according to this inference, the probability of our whole analysis may be very low. At this point, if the probability of type I error should not be calculated by the test level a, but by the p value obtained from each test, then the actual probability of all test results making errors will be obtained
according to Anderson's method, according to Hulu Huapiao, the values and reference intervals of reri, AP and s are calculated by SPSS and his excel table. How can we get the corresponding p value? Forget your advice
References:
Andersson T, Alfredson L, K & 65533; 0� 1llberg H, et al. Calculating measures of biological interaction[J]. European journal of epidemiology, 2005, 20(7): 575-579.
Calculation formula of P value:
= 2 [1]- Φ( When H1 is assumed to be p, it is not equal to P 0
=1- Φ( Z0) when H1 is assumed to be p greater than P0
= Φ( Z0) when H1 is assumed to be p less than P0
Where, Φ( Z0) to look up the table
Finally, when p value is less than a significant parameter, we can deny the hypothesis. On the contrary, hypothesis cannot be deniedexperimental conditions, that is, the differences caused by different treatments, are called inter group differences. The sum of the square sum of the deviations between the mean values of the variables in each group and the total mean values is expressed as SSB and DFB
extended data:
for differences caused by measurement errors or differences between indivials, the sum of the square sum of the deviations between the mean value of variables in each group and the value of variables in the group is expressed as SSW and the degree of freedom in the group is DFW
sum of squares of total deviation SST = SSB + SSW
The mean square MSW and MSB were obtained by dividing the intra group SSW and inter group SSB by their respective degrees of freedom (intra group DFW = N-M, inter group DFB = M-1, where n is the total number of samples and M is the number of groups). One case is that the treatment has no effect, that is, the samples of each group are from the same population, MSB / MSW ≈ 1another case is that the treatment does work, and the mean square between groups is the result of error and different treatment, that is, each sample comes from different populations. So, MSB & gt& gt; MSW (far greater than)
when the control variables are ordinal variables, the trend test can analyze the overall trend of the change of the observed variables with the change of the level of the control variables, whether it presents a linear trend, or a quadratic, cubic polynomial change. Through the trend test, it can help people to grasp the degree of the overall effect of different levels of control variables on the observed variables from another perspective
