Position: Home page » Bitcoin » Bayesian regression and bitcoin

Bayesian regression and bitcoin

Publish: 2021-04-21 06:02:42
1. Bayesian linear regression is based on the ordinary linear regression with a priori P (W) of the model parameters, from the maximum likelihood estimation to the maximum a posteriori, there is no special place.
2. Bayesian linear regression is a polynomial model fitting problem in Econometrics and statistics, which takes "sum of squares of resials" as the statistic. These problems are more professional and can not be explained clearly in a word.
3. Bayesian theory
1. Bayesian rule
the task of machine learning is to determine the best hypothesis in hypothesis space h when given training data D
best hypothesis: one method is to define it as the most likely hypothesis under the knowledge of the prior probability of different hypotheses in given data D and H. Bayesian theory provides a method to calculate the probability of hypothesis, which is based on the prior probability of hypothesis, the probability of different data observed under a given hypothesis and the observed data itself

2. A priori probability and a posteriori probability
P (H) is used to represent the initial probability that h has before training data. P (H) is called the prior probability of H. The prior probability reflects the background knowledge about the chance that h is a correct hypothesis. If there is no prior knowledge, each candidate hypothesis can be given the same prior probability. Similarly, P (d) denotes the prior probability of training data D, and P (D | h) denotes the probability of D when hypothesis h holds. In machine learning, we are concerned with P (H | d), that is, the probability of h when D is given, which is called the posterior probability of H

3. Bayes formula
Bayes formula provides a method to calculate the posterior probability p (H | d) from the prior probability p (H), P (d) and P (D | h)
P (H | d) = P (D | h) * P (H) / P (d)
P (H | d) increases with the increase of P (H) and P (D | h), and decreases with the increase of P (d), that is, if D is independent of H, the more likely it is observed, the less support D has for H< Maximum a posteriori hypothesis (map)
4. Maximum a posteriori hypothesis (map)
when the learner searches for a given data d in the candidate hypothesis set h, h is called maximum a posteriori hypothesis (map)
the method of determining map is to use Bayesian formula to calculate the posterior probability of each candidate hypothesis, and the formula is as follows:
H_ Map = argmax P (H | d) = argmax (P (D | h) * P (H)) / P (d) = argmax P (D | h) * P (H) (H belongs to set h)
in the last step, P (d) is removed because it is a constant independent of H

5. Maximum likelihood hypothesis
in some cases, it can be assumed that each hypothesis in H has the same a priori probability, so that the formula can be further simplified, and only p (D | h) needs to be considered to find the maximum likelihood hypothesis< br />h_ Ml = argmax P (D | h) H belongs to the set h
P (D | h) is often referred to as the likelihood of data d for a given h, and the hypothesis that makes p (D | h) maximum is referred to as the maximum likelihood hypothesis

6. For example,
for a medical diagnosis problem,
there are two optional hypotheses: the patient has cancer and the patient has no cancer.
the available data come from the test results: positive + and negative -
have prior knowledge: in all the population, the prevalence rate is 0.008
the test accuracy rate for the patients who do have disease is 98%, The accuracy rate of the test for the patients without disease is 97%
summarized as follows
P (cancer) = 0.008, P (cancer) = 0.992
P (+ | cancer) = 0.98, P (- | cancer) = 0.02
p (+ | cancer) = 0.03, P (- | cancer) = 0.97
question: suppose there is a new patient with positive test result, should the patient be judged to have cancer? The posterior probabilities P (cancer | +) and P (cancer | +)
are calculated as follows:
P (+ | cancer) P (cancer) = 0.0078
P (+ | cancer) P (cancer) = 0.0298
hmap = cancer
the exact posterior probabilities can be normalized so that the sum of the above results is 1
P (cancer | +) = 0.0078 / (0.0078 + 0.0298) = 0.21
P (ᦇ 61656; Cancer | -) = 0.79
the results of Bayesian reasoning largely depend on the prior probability. In addition, it does not completely accept or reject the hypothesis, but only increases or decreases the probability of hypothesis after observing more data.
4. Bayes is not recommended to use SPSS, you can use other professional software to do
unless you want to do Bayes discrimination
5. 1. Introcing prior regression / classification, or map estimator, can not be regarded as Bayesian method. The complete Bayesian method does not stop at calculating the mode or mean of the posterior, but uses the whole posterior distribution to smooth the prediction process, specifically:

assume that the posterior is, where D is the data set, M is the model, and M is the model parameter
assuming that the prediction function for new data X is
in textbooks, M is usually ignored because we usually study only one model, but if we want to compare multiple different models, then M cannot be ignored

the so-called Bayesian regression is to calculate a predictive distribution:

the predictive distribution can be understood in this way. Different corresponding prediction results are combined to form the final prediction result, and the weight of the combination is based on the magnitude of the posterior, because it is a continuous random variable, So this "combination" is an integral

if you look at map again, it can rece over fitting, but it can't avoid over fitting, because map assumes that the parameters will only take a fixed value, not a distribution, which is a manifestation of overconfidence. More specifically, map approximates the above to a delta function, thus ignoring the uncertainty of the parameter In addition, the marginal likelihood can actually be expressed by the above prediction distribution multiplication:

this process can be understood as: we first calculate the probability of X1 generated by the model, then multiply it by the prediction distribution of x2 when X1 is the training set, and so on. Obviously, if a model is too complex, the prediction distribution value will be small (because the prediction performance is not good), and the marginal likelihood obtained after multiplication is also very small This is actually an explanation on mlapp (see formula 5.14), so marginal likelihood can be used for model selection

finally, why can't the maximum likelihood function be used for model selection? Because the ability of the model is too strong, it can perfectly fit too many data sets (the complexity is too high), so it is easy to fit the training set, and the edge likelihood:

it takes into account the distribution of parameters, and combines the probability of each different generated data set. As before, this combination is an integral. You see, if there are many possibilities (the model is complex), but only one kind of likelihood function has a large value, then the final integral result is very small. This integral will be larger only when the probability of probability is relatively small (simple model), and some of them make the likelihood function larger. Therefore, marginal likelihood can be used for model selection< To sum up, Bayesian method is essentially an average, smoothing, here we only consider the single-layer Bayesian model, in fact, the Bayesian method is still very natural and beautiful in the presence of multi-layer super parameters, but only a few more procts. By averaging, different possibilities are fused to make the prediction result more stable. In fact, linear regression is not the most commonly used place of Bayesian method, but the add-x smoothing (plus x smoothing) in the language model of natural language processing. The so-called plus x smoothing is actually the multinomial distribution plus Dirichlet's prior prediction distribution. All the above contents are summarized from Chapter 5 of mlapp
4. In addition, from the above contents, we can see that the core component of Bayesian method is posterior, which is difficult to calculate for complex models, so Laplace approximation, variational method, MCMC sampling and so on in machine learning are used

by don beatmycat
6. In other words, map estimator can not be regarded as Bayesian method. The complete Bayesian method does not stop at calculating the mode or mean of the posterior, but uses the whole posterior distribution to smooth the prediction process, specifically:

< / OL >

assume that the posterior is, where D is the data set, M is the model, and is the model parameter

assuming that given parameters, the prediction function for new data X is

in textbooks, M is usually ignored, because usually we only study one model, but if we want to compare multiple different models, then M cannot be ignored


the so-called Bayesian regression is to calculate a predictive distribution:


the predictive distribution can be understood in this way. Different corresponding prediction results are combined to form the final prediction result, and the weight of the combination is based on the magnitude of the posterior, because it is a continuous random variable, So this "combination" is an integral


when we look at map again, it can rece over fitting, but it can not avoid over fitting, because map assumes that the parameters will only take a fixed value, not a distribution, which is a manifestation of overconfidence. More specifically, map approximates the above parameters to a delta function, thus ignoring the uncertainty of map In addition, the marginal likelihood can actually be expressed by the above prediction distribution multiplication:


this process can be understood as: we first calculate the probability of X1 generated by the model, and then multiply it by the prediction distribution of x2 when X1 is the training set, and so on. Obviously, if a model is too complex, the prediction distribution value will be small (because the prediction performance is not good), and the marginal likelihood obtained after multiplication is also very small This is actually an explanation on mlapp (see formula 5.14), so marginal likelihood can be used for model selection

Finally, why can't the maximum likelihood function be used for model selection? Because the ability of the model is too strong, it can perfectly fit too many data sets (high complexity), so it is easy to fit the training set, and the edge likelihood:


it takes into account the distribution of parameters, and combines the probability of each different data set. As before, this combination is an integral. You see, if there are many possibilities (the model is complex), but only one kind of likelihood function has a large value, then the final integral result is very small. This integral will be larger only when the probability of probability is relatively small (simple model), and some of them make the likelihood function larger. Therefore, marginal likelihood can be used for model selection

To sum up, Bayesian method is essentially an average, smoothing, here we only consider the single-layer Bayesian model, in fact, the Bayesian method is still very natural and beautiful in the presence of multi-layer super parameters, but just a few more integral. By averaging, different possibilities are fused to make the prediction result more stable. In fact, linear regression is not the most commonly used place of Bayesian method, but the add-x smoothing (plus x smoothing) in the language model of natural language processing. The so-called plus x smoothing is actually the multinomial distribution plus Dirichlet's prior prediction distribution. All the above contents are summarized from Chapter 5 of mlapp

4. In addition, from the above contents, we can see that the core component of Bayesian method is posterior, which is difficult to calculate for complex models, so Laplace approximation, variational method, MCMC sampling and so on in machine learning are used


Author: dontbeatmycat

7. The differences are as follows:
logistic regression, also known as logistic regression analysis, is a generalized linear regression analysis model, which is often used in data mining, automatic disease diagnosis, economic forecasting and other fields. For example, to explore the risk factors of disease, and predict the probability of disease according to the risk factors. Taking the analysis of gastric cancer as an example, two groups of people are selected, one is gastric cancer group, the other is non gastric cancer group. The two groups of people must have different signs and lifestyle. Therefore, the dependent variable is whether gastric cancer, the value is "yes" or "no", the independent variable can include many, such as age, gender, eating habits, Helicobacter pylori infection and so on. Independent variables can be either continuous or classified. Then through logistic regression analysis, we can get the weight of the independent variables, so that we can roughly understand which factors are the risk factors of gastric cancer. At the same time, according to the weight, we can predict the possibility of a person suffering from cancer according to the risk factors
naive Bayes classifier (or NBC) originated from classical mathematical theory, has a solid mathematical foundation and stable classification efficiency. At the same time, the NBC model needs few parameters to estimate, which is not sensitive to missing data, and the algorithm is relatively simple. In theory, NBC model has the smallest error rate compared with other classification methods. But in fact, it is not always the case. This is because the attributes of NBC model are independent of each other. This assumption is often not tenable in practical application, which has a certain impact on the correct classification of NBC model
the way to solve this problem is to build an attribute model and deal with the non independent attributes separately. For example, when Chinese text classification and recognition, we can build a dictionary to deal with some phrases. If special pattern attributes are found in a particular problem, they are handled separately.
8. One is classification, the other is regression, and there is no comparability
under the premise of polynomial distribution, naive Bayes can be written in the form of linear classifier, but it is still not regression.
9. /,

Hongji, former Geji Institute, Lei f Guanpeng, is Meng Peng's old Mingda silver who did 11FL classics and became Su Xing. In the field of Sinology, Lei Ni Kai's EI creation research. T '] I] family did it all together, and made analysis 1 as 16, There are two usages of Ribes. They are science and engineering. 9 years. According to Ye Shi [8intrll] and the combination of Lai Yun line or line sequence auto rection type, this paper uses the property non inter series regression model to analyze the seven macro indexes of the people's output value of BL

in Suzhou No.1, In this paper, we construct a 1 which is composed of several equations and get good results; The Bayesian method is

< br />
Il i I l A

l y l z l t — l y

II < According to the results of the experiment, it is found that the number of fruits increased with the increase of the stress. It is necessary to predict the exogenous variable itself. However, from the literature at home and abroad, i... = 1, but, Guan: = 2: varvar is different from simultaneous equation model, I of model is different from Bayesian inference method system of model One... One... ~ "

potato is a period of time; According to the results of the five characteristics, the limit of distribution and the short-term test were made under the condition of the extended test of youI. Mole Z1 has to be a function of P1 = R [,:]. Type of push ran Ji like lack also orchid one in the number of pre-determined and with the place y. It is said that the storage and measurement of the A-point of the business is not too much. The government is in charge. VV macro's a-model rltl. " Let 1 mole 4 be the 1  ̄ table mlnln of a square mole fruit, The properties of the VAP model Jnn are p-type and PR () is not ~ (}! It  ̄ n) moles 1in  ̄ P are abundant< The Bayesian inference theory of U

time series vector autoregressive model

network library new client massive documents free of charge

1 / 4


n the total noise matrix of empirical number, 2! The shape of the knot is also the same × According to the results of the survey, the samples were divided into 1, 5 dimensional moments u, 4 and 1m, that is, t n, 6 white yuan, In Xiangnian, there are M10 type, 0 tone, 21, and... Are of the type of "quantity, I quantity" in the "mutual three systems" and "mutual three systems" and "mutual three systems" and "mutual three systems" and "mutual three systems" and "mutual three systems" and "mutual three systems" and "mutual three systems" and "mutual three systems". In the parametric method, if the first, second, and third moment direction lie, then the quantity rection system (I.).; The mole 1 line of the required number is less than 1 table I 2 in a kind of wisdom method a) the number transmission method makes the matrix pass a

equation

the posterior partition of the VR () coefficient B is difficult AP mole

napalmobikui difference range: Xi Dian< The addition of the exclusion clause to the real f solution changes, and the constant term I-1, n1Ƈ (the property is added to the number of Y Y and the number of "MWS + 6" generated by ij in P) has m changes, and then the release quantity I "... Quantity lag term is the solution change, ”’11 statistical and decision-making

L term is solution variable; The type of land is its L T 3Z) I) followed by the release of the same, middle, x, one (Y  ̄ model K - (T,,. Ll, nm1sk ∑ ZL  ̄ Li] Z,

one

VIP information http://www.cqvip.com

the theory is quite complicated. Compared with the A. mole P

of non restricted VR (), the AP mole of restricted VR ()

is more complicated, It is difficult to solve the problem of posterior distribution of parameters under the conjugate prior [(~ ILS [(~ izz ∑ 3Z - Z ∑ i3z] distribution, (~ iy (~ IL} ∑ 3Y ∑ 3Z] s even under the diffusion prior distribution, We can only know some conclusions about conditional posterior distribution of B or Σ, which is the following conclusion 2

I ∑ special but one p (E {∑ yzx-1 (P ∑ ̄ iz) (Liz ( ̄ I) (3 (L + is [∑ SS) J

conclusion 2 in the diffusion prior distribution (: B, In this paper, we propose a new method of "restricted VR (Z (- s)] Σ - LM, (?)"

] Si} I) ll ". A. P)

the posterior conditions of model coefficient B and covariance matrix ∑

are distributed as follows: (y) n, ∑ PL, ~ (

2 / 4

[(@ iz ∑; Zpz3]) Σ L, bu I (n, here I [, B; WWQ, (y) SZ =

=
-
-

-

EP-1 [+ II ∑ x {s (- SZ (s)

ll ∑ 1 (- s) 0zpi}) () 10

(@ 1zt  ̄ I, (- 3 ∑ Z ∑~ JQ = ywblyc, - -

here, sy ( ̄ 1-z) ( ̄ I (= Σ 0y (L Σ SJZ.

(WBJ = KY -, K)< br />
p Obviously, ∑; The distribution of Z is that the mean value is (ly) P. firstly, we prove that the P covariance matrix of posterior conditional distribution of parameter B is [(multivariate positive of IZ, Z ∑ T). Obviously, formula 8 is a one-way linear () state distribution, that is, a property model. According to Bayes theorem, the diffusion prior is divided into (∑;) In order to obtain (L, s (11)

in the case of B ∑ L, n ([(~ iz), s (11)
of I ∑)} plyz -, ∑ () ∑ l parameter (, pzj 〕; w) The conditional posterior score of ∑ V is

combined with posterior distribution density 1pt (, ∑ LZY),

distribution density, which needs to be applied (), and 9 formulas can be easily seen from this formula, with reference to

. { Yzl-i number X-i ( ̄ 3P Temple (s ∑ @) r (z) YP) -

(joint post-b ∑ under diffusion prior distribution),

prior distribution density c,

South


lower P {X-1}. P-1x {

3 (∑ YZ + Si)

∑ V) IW, b-3w3wb}

EP {XX {1 (21)

from (y), to ∑ 0 up ( ̄ iz) y ( ̄ I ~ L (L), according to this, B is defined, LW) ∑ in ∑ > Σ 3 (l-Σ s3zsz) - S ('ty ∑ ' ¨ @ 1) In order to get (LW) ∑ B, we can get (BW) from (y, BIW); Y. Of course, the model () 8 can also be transformed into another useful equivalent form ywb +, n (, I = £ ~ 0 ∑ 3 (9)

function still has the form of formula 0, 1); w) Xby [( ̄ iz - ∑ Z ∑ 3pz ( ̄ I] y (the definition of inverse vichtian distribution, ∑ L, - w3y-t ∑ - 1qn (, hold.) I} T1) Z

J (author unit / Nanjing University of science and technology, Hubei Provincial Bureau of Statistics)

≤ 9, where, (YY =

3 / 4


y... y), = Z... w (Z

0... 0,...


)< br />


I

to £ 2... £

0

... PJ


obviously, imposing restrictions on parameters does not affect the selection of prior distribution of model parameters, but the addition of restrictive conditions will make the posterior distribution of parameters

12 statistical and decision-making
Hot content
Inn digger Publish: 2021-05-29 20:04:36 Views: 341
Purchase of virtual currency in trust contract dispute Publish: 2021-05-29 20:04:33 Views: 942
Blockchain trust machine Publish: 2021-05-29 20:04:26 Views: 720
Brief introduction of ant mine Publish: 2021-05-29 20:04:25 Views: 848
Will digital currency open in November Publish: 2021-05-29 19:56:16 Views: 861
Global digital currency asset exchange Publish: 2021-05-29 19:54:29 Views: 603
Mining chip machine S11 Publish: 2021-05-29 19:54:26 Views: 945
Ethereum algorithm Sha3 Publish: 2021-05-29 19:52:40 Views: 643
Talking about blockchain is not reliable Publish: 2021-05-29 19:52:26 Views: 754
Mining machine node query Publish: 2021-05-29 19:36:37 Views: 750