Wednesday, October 9, 2013

Classic post about Empirical Bayesian application in MEG source reconstruction.


Dear Yuri,

Yury Petrov wrote:
> Hi Will,
> 
> I attached the paper. 

Thx, its a top paper.

My concern is that the EM algorithm cannot be
> used to estimate two parameters when one of them is used to define a
> prior for the other. 

It can.

One parameter defining a prior over another results in a hierarchical 
model. Bayesian estimation of linear Gaussian hierarchical models was 
solved in the 70's by the stats community. More recently the machine 
learning community have been using various approximate inference 
algorithms for hierarchical nonlinear/nonGaussian models. See 
Jordan/Bishop/Ghahramani etc.

Irrespectively of how the MSP algorithm has been
> derived, the ReML learning part explicitly described in the Appendix
> of the Phillips et al 2002 paper is violating the Bayes rule. It
> first calculates the source covariance matrix given the solution of
> the previous iteration, then uses its scale (trace) to rescale the
> original source covariance, etc. Yes, it uses the 'lost degrees of
> freedom' trick 

This isn't a trick. It falls naturally out of the mathematics.

to prevent a nonsensically localized solution, but
> this trick does not address the main problem. The algorithm still
> changes the prior based on posterior, then posterior based on the new
> prior, etc. iteratively.
> 

All of what i've said corresponds to the framework of Empirical Bayes - 
where you estimate the parameters of priors from data.

Pure Bayesians do not allow this. They see it, as you say, as a 
violation of what a prior is.

But then pure Bayesians have'nt solved many interesting problems. The 
Empirical Bayesian claims to know only the form of prior densities. Not 
their parameters.

Best,

Will.

> 
> 
> ------------------------------------------------------------------------
> 
> 
> 
> 
> On Sep 22, 2010, at Sep 22, 2010 | 1:14 PM, Will Penny wrote:
> 
>> Dear Yury,
>> 
>>>> ---------------------------------- Dear All,
>>>> 
>>>> I have a conceptual concern regarding the MSP algorithm used by
>>>>  SPM8 to localize sources of EEG/MEG activity. The algorithm is
>>>>  based, in part, on EM iterative scheme used to estimate source
>>>>  priors (source covariance matrix) from the measurements. The
>>>> way this scheme is described in the Phillips et al. 2002 paper,
>>>> it works as an iterative Bayesian estimator: first it estimates
>>>> the sources, then calculates the resulting source covariance
>>>> from the estimate, next it (effectively) uses it as the new
>>>> prior for the sources, estimates the sources again, etc.
>>>> However, applying Bayesian learning iteratively is a common
>>>> pitfall and should not be used, because each such iteration
>>>> amounts to introducing new fictitious data. I attached a nice
>>>> introductory paper illustrating the pitfall on page 1426.
>> 
>> I don't believe that this is a pitfall.
>> 
>> The parameters of the prior (specifically the variance components)
>> are estimated iteratively along with the variance components of the
>> likelihood.
>> 
>> Importantly, each is estimated using degrees of freedom which are 
>> effectively partitioned into those used to estimate prior variance
>> and those used to estimate noise variance. This is a standard
>> Empirical Bayesian approach and produces unbiased results.
>> 
>> See papers by David Mackay on this topic and eg. page 6-8 of the
>> chapter on 'Hierarchical Models' in the SPM book (this is available
>> under publications/book chapters on my web page 
>> http://www.fil.ion.ucl.ac.uk/~wpenny/ - note gamma and (k-gamma)
>> terms in denominator of eqs 32 and 35 denoting the partitioning of
>> the degrees of freedom).
>> 
>> Nevertheless, I'd like to read page 1426 of your introductory
>> paper. Can you send it to me ?
>> 
>> Best wishes,
>> 
>> Will.
>> 
>> In particular, the outcome of the
>>>> iterations may become biased toward the original source
>>>> covariance used. In my test application of the described EM
>>>> algorithm I found that scaling the original source covariance
>>>> matrix changes the resulting sources estimate, which, in
>>>> principle, should not happen. For comparison, this problem does
>>>> not occur, when the source covariance parameters are learned
>>>> using ordinary or general cross-validation (OCV or GCV).
>>>> 
>>>> Best, Yury
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> -- William D. Penny Wellcome Trust Centre for Neuroimaging 
>> University College London 12 Queen Square London WC1N 3BG
>> 
>> Tel: 020 7833 7475 FAX: 020 7813 1420 Email:
>> [log in to unmask] URL: http://www.fil.ion.ucl.ac.uk/~wpenny/
>> 
>> 
> 

-- 
William D. Penny
Wellcome Trust Centre for Neuroimaging
University College London
12 Queen Square
London WC1N 3BG

Tel: 020 7833 7475
FAX: 020 7813 1420
Email: [log in to unmask]
URL: http://www.fil.ion.ucl.ac.uk/~wpenny/