P.H.D - Pursuing Happiness Daily

I enjoyed sharing my life experience with others. Hope this blog can be helpful. Let us have courage and be kind to one another every day. A short bio about me: I grew up in Shanghai, China and owned my Ph.D. in BME from University of Cincinnati in the end of 2013 and moved to Boston for a post-doc position at Boston Children's Hospital/Harvard Medical School. I used advanced imaging techniques (MEG, FMRI, EEG, DTI, etc) to study normal language and reading development in the past 10 years.

Wednesday, October 26, 2011

Good websites and ggplot is great!

http://www.graphpad.com/www/book/choose.htm

https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=FSL;54226f4c.1102

http://glmm.wikidot.com/examples

http://www.colbyimaging.com/wiki/statistics/longitudinal-data

http://statistics.laerd.com/spss-tutorials/testing-for-normality-using-spss-statistics.php

Thursday, October 20, 2011

Good course.

http://afni.nimh.nih.gov/pub/dist/HOWTO/howto/ht05_group/html/background_ANOVA.shtml

FW from FSL

Hi - easy - use the cluster *index* output option instead

--oindex

Cheers.

On 29 Mar 2011, at 19:51, Michael B wrote:

Hi there,

I have run TBSS and have found several clusters of significant voxels. My goal is to extract information from these clusters individually, such that for each cluster, I have an FA value (derived from all_FA.nii.gz), a t-statistic (derived from *_tstat*.nii.gz), and coordinates. To achieve this goal, I created an image that labelled the clusters according to size, using the cluster command:

cluster --in=*_tfce_corrp_tstat1 --thresh=0.95 --osize=*_tfce_corrp_tstat1_size

Subsequently, I thresholded this "size image," such that I created a mask for each cluster, using the fslmaths command:

fslmaths *_tfce_corrp_tstat1_size -thr (size of cluster 1) -uthr (size of cluster 1) -bin *_tfce_corrp_tstat1_size_mask1
fslmaths *_tfce_corrp_tstat1_size -thr (size of cluster 2) -uthr (size of cluster 2) -bin *_tfce_corrp_tstat1_size_mask2
fslmaths *_tfce_corrp_tstat1_size -thr (size of cluster 3) -uthr (size of cluster 3) -bin *_tfce_corrp_tstat1_size_mask3
etc.

In this way, it became possible to extract information from the clusters individually. However, this approach falls apart when 2 or more separate clusters have the same size. In this case, the mask includes 2 or more clusters and the information extracted is from several clusters collectively, not from the clusters individually. How might I solve this problem?

I would dearly appreciate some guidance on this issue!

Thank you,
Michael

---------------------------------------------------------------------------
Stephen M. Smith, Professor of Biomedical Engineering
Associate Director, Oxford University FMRIB Centre

FMRIB, JR Hospital, Headington, Oxford OX3 9DU, UK
+44 (0) 1865 222726 (fax 222717)
steve@fmrib.ox.ac.uk http://www.fmrib.ox.ac.uk/~steve
---------------------------------------------------------------------------

Wednesday, October 19, 2011

Like Gang Chen's posts. Clear!

General Linear Testing in Group Analysis

The assumption of data normality in the analysis of variance (ANOVA) can be frequently violated but usually with minor effects. There are a few other assumptions about group analysis, among which the following two need more attention:
(1) Homogeneity of covariance for between-subjects factor: all levels of the between-subjects factor have the same variance. This is essentially the same assumption in two-sample t test (3dttest) with the same variance assumption of the two samples.
According to David C. Howell, if the population distributions tend to be symmetric, or at least similarly shaped or unidirectionally skewed, and if maximum variance among the populations is less than 4 times the minimum, the consquence of variance heterogeneity, when sample sizes are equal, is not very serious. However, inequality of sample size raises a huge concern if the homogeneity of variance assumption is violated. In case of possible violation of this assumption, try to aviod the possibility of unequal sample sizes.
(2) Sphericity (circularity) for within-subject factor - it requires the homogeneity of variance among the differences between all possible level pairs of the within-subject factor. However, a more useful and more convenient condition is a relatively stronger restriction (sufficient but not necessary): it requires both the homogeneity of variance (all levels of the within-subject factor have equal variance) and the homogeneity of covariance (the correlations are equal among each possible pair of the within-subject factor levels). This stronger restriction is called compound symmetry because the covariance matrix for the within-subject factor levels is symmetrical.
When all assumptions are met all the AFNI ANOVA programs can be safely employed to obtain F statistics for main effects and interactions. Nevertheless, when a contrast or simple effect involves only a portion of the whole dataset, we recommend the following testing principle to reduce the potential risk of sphericity violation:
Any effect involving a portion of the data is tested exclusively based on that portion.
Even if a within-subject factor passes sphericity test, the precautious practice of the above principle generates robust results. When sphericity assumption is violated especially if there is different correlation across factor levels, the above practice is immune to the violation. Therefore simple effects and contrasts, if possible, should be tested with one-sample, two-sample, or paired t test (3dttest). In other words, when appropriate contrasts from individual subject analysis (output of 3dDeconvolve) can be directly brought into group analysis.
In the same spirit we have modified 3dANOVA2 and 3dANOVA3 so that the options of -amean, -bmean, -adiff, -bdiff, -acontr and -bcontr are aligned up with the above principle for one-way within-subject ANOVA (3dANOVA2 -type 3) and two-way within-subject ANOVA (3dANOVA3 -type 4) and two-way mixed-effects ANOVA (3dANOVA3 -type 5). A justification for the modification for simple effect testing (-amean and -bmean) is discussed here. These modifications guard 3dANOVA2 and 3dANOVA3 against any potential risk of sphericity violation as much as possible, and also lift the old constraint (all coeficients should add up to 0) on option -acontr and -bcontr in the two programs. Keep in mind all these modifications on -amean, -bmean, -adiff, -bdiff, -acontr and -bcontr are essentially equivalent to running multiple corresponding t tests with 3dttest. In other words, you could run separate t tests with 3dttest instead of 3dANOVA2 or 3dANOVA3, but the later is definitely much neater with all results bundled in one single output file.
Another fact to notice here is that the tests done through -amean, -bmean, -adiff, and -bdiff are simply special cases for -acontr and -bcontr. Essentially -acontr and -bcontr would be good enough for all general linear tests. So literally -acontr and -bcontr should be more suitably called -glt in the sense all coefficients don't necessarily sum up to 0, comparable to the option in 3dDeconvolve, but we just decided to carry on the notation in 3dANOVA2 and 3dANOVA3.
In fact in the old version of the three ANOVA programs -mean, -diff, and -contr also bear the same underlying formulas: -mean and -diff were special cases of -contr. This was troublesome as demonstrated in Simple Effect testing at Group Level. Moreover, when general linear test coefficients do not add up to 0, the same problem would occur. Let's start with the same one-way within-subject (repeated-measures) ANOVA model (3dANOVA2 -type 3)

Y_ij = μ + α_i+ β_j + ε_ij

where,

Y_ij independent variable – regression coefficient (% signal change) from individual subject analysis;

μ constant – grand mean;

α_i constants subject to Σα_i = 0 – simple effect of factor A at level i, i = 1, 2, ..., a;

β_j independent N(0, σ_p²) – random effect of subject j, j = 1, 2, ..., b;

ε_ij independent N(0, σ²) – random error or within-subject variability or interaction between the factor of interest and subject.

with following assumptions:

E(Y_ij) = μ + α_i, Var(Y_ij) = σ_p² + σ², Cov(Y_ij, Y_i'j) = σ_p² (i ‡ i'), Cov(Y_ij,,Y_i'j') = 0 (j ‡ j');

Correlation between any two levels (α_iand α_j) of factor A: σ_p²/(σ_p² + σ²).

Now consider a general linear null hypothesis H₀: Σc_iα_i = 0. The test in 3dANOVA2 -type 3 was
    t = Σc_iY_i·/sqrt(Σc_i²MSAS)
where MSAS is the interaction between the factor of interest and subject. As
    E(Σc_iY_i·) = μΣc_j,
    Var(Σc_iY_i·) = (Σc_i)²σ_p² + (Σc_i²)σ², E(MSAS) = σ²

Var(Σc_iY_i·) = E(MSAS) if and only if Σc_i = 0. That is, ignoring sphericity issue, the above t test is legitimate if and only if Σc_i = 0, otherwise subject variability -- (Σc_i)²σ_p² in Var(Σc_iY_i·) -- is not accounted for in the test, leading to inflated t value.

In the modified version of the programs, the above inflation automatically resolves with the new strategy. Furthermore, two new options were added options in 3dANOVA3, -aBcontr and -Abcontr, for types 4 and 5. Their usage is:

-aBcontr c₁ c₂...c_a : j contrast_label -- 2nd order contrast in factor A, at fixed B level j

-Abcontr i : c₁ c₂...c_b contrast_label -- 2nd order contrast in factor B, at fixed A level i

If sphericity violation is a concern, when there are only two levels across all between- and within-subject factors, contrast testing in all other ANOVA programs in AFNI (3dRegAna and GroupAna of the Matlab package) are consistent with the above principle. Bear in mind the coeficients of a contrast should always add up to 0 in GroupAna.

1. David C. Howell, Statistical Methods for Psychology, Wadsworth Publishing; 5th edition, 2001

How do I find out if my Linux server CPU can run a 64 bit kernel version (apps) or not?

$ uname -a

or

$ less /proc/cpuinfo

(use q to get out)

FW:: be aware of the -e option when you use it for your TBSS analysis

Stunning differneces between 2 different tbss analyses

Check your randomise version. <4.1.8 -e option does not work well.

==========================================

Hello,

The bug in the -e option will have no effect on randomise output unless the -e option was used. For analyses where the -e option was not used, there is no need to re-run.

Many Regards

Matthew
Dear FSL,

Regarding the bug in randomise. is this bug is only for the -e option or for other TBSS statistical analysis also? as i am using 4.1.7 for some and used 4.1.6 for the other TBSS analysis using

"randomise -i all_FA_skeletonised -o tbss -m mean_FA_skeleton_mask -d design.mat -t design.con -n 5000 --T2 -V" as suggested in the TBSS help.

Do i need to reanalysis my data again with new randomise ? However i have done the randomise 2 -3 times but all the time the results are same.....

Please suggest me

Thank you

On Fri, May 13, 2011 at 7:03 PM, Matthew Webster <mwebster@fmrib.ox.ac.uk> wrote:

Hello Michael,
I've copied the text from the 4.1.8 randomise documentation below:

The pre-FSL4.1.8 version of randomise had a bug in the -e option that could generate incorrect permutation of scans over subject blocks. Incorrectly permuting scans over subject blocks does two things: Under permutation, it will randomly induce big positive or negative effects (inflating the variability in the numerator of the t statistics over permutations), but it will ALSO increase the residual standard deviation for each fit (inflating the denominator of the t statistic on each permutation). Hence it isn't clear which will dominate, whether the permutation distribution of T values will be artifactually expanded (wrongly decreasing significance), or artifactually contracted (wrongly inflating significance).

With some simulations, and with some re-analysis of real data, it appears that the effect of the bug is to always to wrongly deflate significance. Thus, it is anticipated that results with the corrected code will have improved significance.

Hope this helps,

Matthew

> Hi Matthew,
> Can you please elaborate as to what the issue with the -e option in
> randomise is? In particular, are results using the -e option
> potentially wrong, and if so, under what conditions?
>
> thanks,
> -MH
>
> On Thu, 2011-05-12 at 09:52 +0100, Matthew Webster wrote:
>> Hello Raphael,
>> This may be due to a current issue with the -e option in randomise - we will hopefully be releasing FSL 4.1.8. with a patched version of randomise in the next few days, I would strongly recommend rerunning the analysis with this new version.
>>
>> Many Regards
>>
>> Matthew
>>
>>> Dear FSL-experts,
>>>
>>> I`m very surprised by the difference between two tbss analyses i ran. In fact, I forget to include the DTI images of one subject into a longitudinal analysis with 3 points in time and 2 groups. Thus I reran the analysis with altogether 13 subjects in each group and surprisingly, results differed extremly. Testing for a main effect of time and an interaction of groups with all 26 subjects included in the model all the skeleton shows up when looking at tfce-corrected fstat-images at p < .05 (> .95 in fsl-terminology). However, the same results look way more plausibel (two cluster at the expected location) when I only consider 25 subjects, that is to say with the subject that I initally forgot to incorporate in the analysis. I visually inspected the results of the preprocssing steps before subjecting data to tbss analysis as well as the results of the tbss analysis (e.g. the all_FA_skeletonized image, etc.), results look fine.
>>>
>>> I attached pictures of the design matrices and contrasts as well as the exchangeability block (EB) or to be precise ".grp" file/s. As can be seen from the design files, I only added one subject and of course adjusted the EB file accorindgly. The first 26 columns of my design matrix code subject factors, the last four columns the experimental group and control group at T1 and T2. The columns are respectively labeled in the design matrix. The last four columns look somewhat scrabbled which is due to the odering of subjects in the all_FA file which makes the membership of subjects to groups most likely only obvious to me.
>>>
>>> Randomise was used with the following commands:
>>>
>>> randomise -i all_FA_skeletonized.nii.gz -m -o ... -m mean_FA_skeleton_mask.nii.gz -d ... .mat -t ... .con -f ... .fts -e ... .grp --fonly -n 5000 --T2 -V.1
>>>
>>> I would vey much appreciate any help on this as I´m a little bit desperate on how to explain this differences and on how to proceed.
>>>
>>> Thank you very much in advance.
>>>
>>> Raphael
>>>
>>>
>>> <13vs12.zip><13vs13.zip>
>

--
Dr. BHAVANI SHANKARA BAGEPALLY
MBBS, (PhD in Clinical Neuroscience) NIMHANS,
Bangalore,
INDIA-560029
email: bshankara@gmail.com

FW:: Can You Handle The Power?

Can You Handle The Power?

The awk command combines the functions of grep and sed, making it one of the most powerful Unix commands. Using awk, you can substitute words from an input file's lines for words in a template or perform calculations on numbers within a file. (In case you're wondering how awk got such an offbeat name, it's derived from the surnames of the three programmers who invented it.)
To use awk, you write a miniature program in a C-like language that transforms each line of the input file. We'll concentrate only on the print function of awk, since that's the most useful and the least confusing of all the things awk can do. The general form of the awk command is
awk <pattern> '{print <stuff>}' <file>
In this case, stuff is going to be some combination of text, special variables that represent each word in the input line, and perhaps a mathematical operator or two. As awk processes each line of the input file, each word on the line is assigned to variables named $1 (the first word), $2 (the second word), and so on. (The variable $0 contains the entire line.)
Let's start with a file, words.data, that contains these lines:
nail hammer wood
pedal foot car
clown pie circus
Now we'll use the print function in awk to plug the words from each input line into a template, like this:
awk '{print "Hit the",$1,"with your",$2}' words.data
Hit the nail with your hammer
Hit the pedal with your foot
Hit the clown with your pie
Say some of the data in your input file is numeric, as in the grades.data file shown here:
Rogers 87 100 95
Lambchop 66 89 76
Barney 12 36 27
You can perform calculations like this:
awk '{print "Avg for",$1,"is",($2+$3+$4)/3}' grades.data
Avg for Rogers is 94
Avg for Lambchop is 77
Avg for Barney is 25
So far, we haven't specified any value for pattern in these examples, but if you want to exclude lines from being processed, you can enter something like this:
awk /^clown/'{print "See the",$1,"at the",$3}' words.data
See the clown at the circus
Here, we told awk to consider only the input lines that start with clown. Note also that there is no space between the pattern and the print specifier. If you put a space there, awk will think the input file is '{print and will not work. But all this is just the tip of the awk iceberg--entire books have been written on this command. If you are a programmer, try the man awk command.
For more information on the awk command, see the awk manual.

FW:: Does Linux Have a Search & Replace Feature?

Does Linux Have a Search & Replace Feature?

You can use the sed command to change all occurrences of one string to another within a file, just like the search-and-replace feature of your word processor. The sed command can also delete a range of lines from a file. Since sed is a stream editor, it takes the file given as input, and sends the output to the screen, unless you redirect output to a file. In other words, sed does not change the input file.
The general forms of the sed command are as follows:
Substitution sed 's/<oldstring>/<newstri ng>/g' <file>
Deletion sed '<start>,<end>d' < file>
Let's start with a substitution example. If you want to change all occurrences of lamb to ham in the poem.txt file in the grep example, enter this:
sed 's/lamb/ham/g' poem.txt
Mary had a little ham
Mary fried a lot of spam
Jack ate a Spam sandwich
Jill had a ham spamwich
In the quoted string, the "s" means substitute, and the "g" means make a global change. You can also leave off the "g" (to change only the first occurrence on each line) or specify a number instead (to change the first n occurrences on each line).
Now let's try an example involving deletion of lines. The values for start and end can be either a line number or a pattern to match. All lines from the start line to the end line are removed from the output. This example will delete starting at line 2, up to and including line 3:
sed '2,3d' poem.txt
Mary had a little lamb
Jill had a lamb spamwich
This example will delete starting at line 1, up to and including the next line containing Jack:
sed '1,/Jack/d' poem.txt
Jill had a lamb spamwich
The most common use of sed is to change one string of text to another string of text. But I should mention that the strings that sed uses for search and delete are actually regular expressions. This means you can use pattern matching, just as with grep. Although you'll probably never need to do anything like this, here's an example anyway. To change any occurrences of lamb at the end of a line to ham, and save the results in a new file, enter this:
sed 's/lamb$/ham/g' poem.txt > new.file
Since we directed output to a file, sed didn't print anything on the screen. If you look at the contents of new.file it will show these lines:
Mary had a little ham
Mary fried a lot of spam
Jack ate a Spam sandwich
Jill had a lamb spamwich
Use the man sed command for more information on using sed.
For more information on the sed command, see the sed manual.

Monday, October 17, 2011

FA > 1 problem.

FW

13. Correction for multiple comparisons: Monte Carlo simulation

One major consideration when conducting voxelwise statistical tests is the sheer number of tests that are being performed. In such cases, Type I error inflation is a problem. A common approach to correct for this is to use a clustering method that tries to determine how big a contiguous cluster of voxels, each one significant at an uncorrected threshold of P_u, has to be to in order to be significant at a threshold P_c – that is corrected for multiple comparisons. To determine how big such a cluster needs to be, there are 3 common methods:

1. Guassian Random Field Theory: Uses the mathematical properties of random fields to derive a corrected cluster size.

2. Permutation tests: Uses the data itself to create a null distribution of cluster sizes, from which the cluster size corresopnding to a desired corrected P can be read off.

3. Monte Carlo simulation: Creates multiple simulated null datasets, and from them creates a distribution of cluster sizes, from which the cluster size corresponding to a desired corrected P can be read off.

We could go into a lot of detail about the relative merits of each method here, but that would be somewhat irrelevant for AFNI analysis, since the Monte Carlo method is the only one currently implemented in AFNI (other techniques might become available in the near future).
In order to carry out the Monte Carlo simulation, we need some basic information about the data itself, as well as the uncorrected P_u that we plan on using for creating clusters, and a few other parameters that control the clustering operation.
First we need an estimate of how spatially smooth the noise in the dataset is. To estimate this, we use the program 3dFWHM:
3dFWHM –dset err_ts+orig[0] –mask ‘3dcalc( –a glm_out+orig[0] –expr step(a-500))’
Here we have used the residual time series from an individual subject GLM to estimate the spatial smoothness. We have masked the time series with the thresholded baseline estimate so as to exclude voxels from outside the brain. This command will return output like the following:

Gaussian filter widths:
sigmax = 1.72 FWHMx = 4.05
sigmay = 1.56 FWHMy = 3.67
sigmaz = 1.70 FWHMz = 4.00

So for this dataset, our estimate would be that the full-width-at-half-maximum (FWHM) estimate of spatial smoothness is approximately 4mm. We could then do this for all our subjects and average the results. However, if we used spatial blurring on our statistical percent signal change estimates of more that 4mm, we should use the FWHM of the spatial blur that we applied. In the example above, we used a spatial blur of 5mm, so our estimate of spatial smoothness that we use in Monte Carlo simulations should be 5mm.
The next step is to decide upon the uncorrected P_u that we plan on using for creating clusters. This choice is fairly arbitrary. If we choose a fairly liberal voxelwise P_u, say P_u= 0.01, we will be able to detect large clusters of voxels that are activated at a fairly liberal threshold, but we will miss small clusters that are made up of highly significantly activated voxels. If, on the other hand, we use a conservative P_u, say P_u= 0.0001, we will be able to detect small clusters of highly activated voxels, but not larger clusters of less activated voxels. The final choice on a P_udepends on the size of clusters that we are looking for, as well as considerations of what our overall statistical power is likely to be.
Let’s say for the current example that we choose a P_u= 0.01.
The other parameter that we need to determine is what we want our cluster connection radius to be. This specifies how close two voxels need to be to one another in order to be considered contiguous. Usually we would use the voxel size of the datasets being analysed, in this case 2mm.
So now we can run the simulation:
AlphaSim –mask glm_out_perc+tlrc[0] –fwhm 5 –rmm 2 –pthr 0.01 –iter 1000
The –iter 1000 specifies that we want to run 1000 simulations. This is usually enough. The program will take quite a few minutes to run, and then produce output like the following:

Cl Size Frequency Cum Prop p/Voxel Max Freq Alpha
 1 229840 0.328718 0.01012002 0 1.000000
 2 108491 0.483882 0.00947785 0 1.000000
 3 66742 0.579336 0.00887161 0 1.000000
 4 51187 0.652544 0.00831218 0 1.000000
 5 39186 0.708588 0.00774011 0 1.000000

and so on. What you now need to do is read down the right-most column, which gives the corrected P_c for each minimum cluster size (left hand column).

84 4 0.999920 0.00001630 4 0.058000
 85 5 0.999927 0.00001536 5 0.054000
 86 1 0.999929 0.00001417 1 0.049000
 87 5 0.999936 0.00001393 4 0.048000

So from this we can see that according to Monte Carlo simulations, thresholding the statistical images with an uncorrected P_u = 0.01, and clustering with a cluster connection radius of 2mm, the resulting clusters need to be at least 86 voxels in size in order to achieve a corrected P_c < 0.05. We can now use this information to threshold and cluster our group analysis statistical maps, which is explained in the next section.

14. Correction for multiple comparisons: Clustering

We can now use the information from Monte Carlo simulations to threshold and cluster our group analysis statistical maps, so as to get thresholded statistical images corrected for multiple comparisons. This we must do separately for each contrast or F-test that we are interested in. For the preceding example, imagine that we are interested in the F-test for the interaction of group by face. First we need to find out where this information is stored in the anova+tlrc bucket file:
3dinfo anova+tlrc
This produces output such as:

-- At sub-brick #0 'group:Inten' datum type is short: 0 to 32767 [internal]
 [* 0.000644032] 0 to 21.103 [scaled]
 -- At sub-brick #1 'group:F-stat' datum type is short: 0 to 2562 [internal]
 [* 0.01] 0 to 25.62 [scaled]
 statcode = fift; statpar = 1 14
 -- At sub-brick #2 'face:Inten' datum type is short: 0 to 32767 [internal]
 [* 0.000571107] 0 to 18.7135 [scaled]
 -- At sub-brick #3 'face:F-stat' datum type is short: 0 to 3689 [internal]
 statcode = fift; statpar = 2 28
 -- At sub-brick #4 'groupByFace:Inten' datum type is short: 0 to 32767 [internal]
 [* 0.000571007] 0 to 19.7125 [scaled]
 -- At sub-brick #5 'groupByFace:F-stat' datum type is short: 0 to 3489 [internal]
 statcode = fift; statpar = 2 28

This tells us that the sub-brick we want to threshold with is sub-brick 5. We can also keep the intensity sub-brick for the interaction, although this would make more sense for a t-test of a specific contrast, since the intensity sub-brick in that case contains the value of the contrast itself (e.g. percent signal change contrast between two conditions).
Before we run the cluster command, we need to determine the F-statistic that corresponds to a P_u of 0.01, for the degrees of freedom that we have for the interaction F-test. This can be read off from tables, calculated using Excel or an online calculator, or read off from the threshold slider in AFNI if viewing the interaction statistical map. In this case, the correct F-statistic is 5.45.
Now we can use the 3dmerge command to perform the clustering:
3dmerge –session . –prefix group_by_face_clust –1thresh 5.45 -1clust 2 –86 anova+tlrc[4,5]
3dmerge –session . –prefix group_by_face_clust_order –1thresh 5.45 -1clust_order 2 –86 anova+tlrc[4,5]
The first command will identify clusters of voxels that exceed an F-value of 5.45, using a cluster connectivity radius of 2mm, and minimum cluster size of 86 voxels, using sub-brick 5 as the thresholding brick, saving the thresholded, clustered data in a new bucket dataset called group_by_face+tlrc. The second command will do the same thing, except all voxel intensities within a cluster will be replaced by the cluster size index (largest cluster=1, next largest=2, ...). This will be useful for extracting data from specific clusters later on. The reason why we specify the minimum cluster size as a negative number is that denotes cluster size in voxels. If we were to put a positive 86 there, that would denote 86mm³.
Some of the other options in the command 3dmerge might be useful. In particular, you can play around with the –1erode and –1dilate options, which are useful in separating clusters that are joined by a narrow “neck”.
We would typically want to carry out the same basic operation on all the other F-tests and contrasts of interest. When this is done, it is possible to combine all the resulting clustered datasets into one big buckets file using the command 3dbucket.
This basically completes the voxelwise group analysis of data.

15. Extracting cluster means from individual subjects

One remaining step that you might want to take is to extract each subject’s individual percent signal change parameter estimates from the cluster regions identified in the preceding step, for entry into a statistics package or for graphing results. To do this, you can use the 3dmaskave command. Say, for example, that when looking at the face by group clusters identified in the preceding step, we were interested in a cluster in prefrontal cortex, which happens to be the 5^th largest cluster in the group by face interaction dataset. Thus voxels in this cluster would have a value of 5 in the group_by_face_clust_order+tlrc dataset. We can thus use this to extract mean voxel percent signal change for individual subjects as follows:
3dmaskave –mask group_by_face_clust_order+tlrc –mrange 5 5 control001/glm_out_5mm+tlrc[4]
This command would print on the screen the mean percent signal change for this cluster for control subject 1 for the happy face condition. We could repeat this command for the other face conditions, as well as for all the other subjects. The mean percent signal change values could then be used to create graphs with error bars etc.
Alternatively, if want to output the mean percent signal change for each cluster, we could use the following command:
3dROIstats -–mask group_by_face_clust_order+tlrc control001/glm_out_5mm+tlrc[4]
This would cycle through each cluster in group_by_face_clust_order+orig and output the mean for each one for control subject 1, face condition happy.
Because these commands simply print their output to the screen, you’ll probably want to use them with the Linux redirection operators > and >> to save the output to a text file. Obviously these commands are probably best used in a script that cycles through subjects and conditions. For example, the following simple Python script will cycle through 6 subjects and extract the cluster means from 4 designated clusters and two conditions.

#!/usr/bin/python

# filename: extract_clusters.py
# simple script to cycle through 6 subjects and extract the cluster means from 4 designated clusters and two conditions
# means are first appended to a temporary text file, with subject number, cluster number, and condition brick
# appended to a separate temporary text file. The two files are then combined using 1dcat to give a properly indexed
# file which could be imported into a stats package or spreadsheet program

import os,sys

# set up study-specific directories and file names
top_dir = '/study/my_study/analysis/'
cluster_mask_file = top_dir+'group_by_face_clust_order+tlrc'
subject_stats_filename = 'glm_out_5mm+tlrc'
output_file = top_dir+'clust_means.txt'


# specify the subjects, clusters and bricks that you want to extract
subjects = ['001','002','004','008','009','010'] # these are the subjects (should be same as subject directory names)
clusters = [2,5,6,8] # these are the cluster numbers that we want to extract
condition_bricks = [4,6] # these are the bricks in the individual subject data files that contain the conditions of interest

# check to see for existence of temporary files
if os.isfile('tmp.txt') or os.isfile('tmp2.txt'):
        print 'You first need to delete or rename the file(s) tmp.txt and tmp2.txt'
        sys.exit()

# check to see for existence of output file
if os.isfile(output_file):
        print 'You first need to delete or rename the output file '+output_file
        sys.exit()


# now loop through subjects, clusters and bricks to get data
for subject in subjects:
        for cluster in clusters:
                for brick in condition_bricks:
                        subject_dir = top_dir+subject+'/'
                        data_file = subject_dir+subject_stats_filename+'['+str(brick)+']'
                        command = '3dmaskave –mask '+cluster_mask_file+' -mrange '+str(cluster)+' '+str(cluster)+' '+data_file+' >> tmp.txt'
                        print command
                        os.system(command)
                        command = 'echo '+subject+' '+str(cluster)+' '+str(brick)+' >> tmp2.txt'
                        print command
                        os.system(command)

# create the output file with appropriate column headings
command  = 'echo subject cluster brick contrast > '+output_file
print command
os.system(command)

# now combine the two temporary file and then delete them
command  = '1dcat tmp2.txt tmp.txt[0] >> '+output_file
print command
os.system(command)

os.remove('tmp.txt')
os.remove('tmp2.txt')

Assuming the script is saved as extract_clusters.py, this script can easily be run by typing:

python extract_clusters.py
You can learn more about Python scripting here.

Shell programming

http://steve-parker.org/sh/wildcards.shtml

Good resources: online t-test

http://www.graphpad.com/quickcalcs/ttest1.cfm

Sunday, October 16, 2011

Cluster commands

http://www.ualberta.ca/CNS/RESEARCH/LSF/doc/users/07-track.htm

http://wikis.uit.tufts.edu/confluence/display/TuftsUITResearchComputing/Basic+linux+and+LSF

http://www.cisl.ucar.edu/docs/LSF/7.0.3/command_reference/bmod.cmdref.html

http://bmi.cchmc.org/resources/software/lsf-examples

Examples:

% bjobs -l 1531Job Id <1531>, User , Project , Status , Queue
 Command 
Fri Dec 27 13:04:14  Submitted from host , CWD <$HOME>, 
  SpecifiedHosts ;
Fri Dec 27 13:04:19:  Started on , Execution Home   >, Execution CWD ;
Fri Dec 27 13:05:00:  Resource usage collected.
The CPU time used is 2 seconds.
MEM: 147 Kbytes; SWAP: 201 Kbytes PGID: 8920;  PIDs: 8920 8921 8922 

SCHEDULING PARAMETERS:
          r15s   r1m   r15m   ut    pg    io    ls    it    tmp   swp   mem
loadSched -      -     -      -     -     -     -     -     -     -     -
loadStop  -      -     -      -     -     -     -     -     -     -     -

%bhist -l 1531JobId <1531>, User , Project , Command< example200>
Fri Dec 27 13:04:14:  Submitted from host  to Queue ,="" cwd <$home>, specified hosts ;
Fri Dec 27 13:04:19:  Dispatched to ;
Fri Dec 27 13:04:19:  Starting (Pid 8920);
Fri Dec 27 13:04:20:  Running with execution home , Exe
  cution CWD , Execution Pid <8920>
  ;
Fri Dec 27 13:05:49:  Suspended by the user or administrator;
Fri Dec 27 13:05:56:  Suspended: Waiting for re-scheduling after bei
  ng resumed byuser;
Fri Dec 27 13:05:57:  Running;
Fri Dec 27 13:07:52:  Done successfully. The CPU time used is 28.3 s
  conds.

Summary of time in seconds spent in various states by Sat Dec 27 
13:07:52 1997

PEND  PSUSP  RUN  USUSP  SSUSP  UNKWN  TOTAL
5     0      205  7      1      0      218

Saturday, October 15, 2011

Good point from Gang Chen.

FW: http://afni.nimh.nih.gov/afni/community/board/read.php?f=1&i=33945&t=33930#reply_33945

Almost everybody was (and still is) taught in school that analysis of a linear system with categorical variables (factors) would be typically handled through analysis of variance (ANOVA). If an experiment falls nicely into a balanced design and if all the relevant assumptions are reasonable met, a factorial ANOVA seems nice because everything is wrapped into one setting and you can get all the post-hoc tests in the end. However, thanks to the popularity of such an approach, people tend to forget about a couple of issues involved in ANOVA:

(1) Although sophisticated ANOVA programs do have a lot of flexible room dealing with covariance structure, the straightforward ANOVA methods such as 3dANOVA* assume compound symmetry (e.g., equal correlation across levels of a factor), or homoscedasticity (equal variance across the levels of a between-subjects factor such as group) for F-tests (and for t-tests as well in the older version of 3dANOVA*), and some popular covariance structures such as serial correlation are not allowed.

(2) It might possible to deal with missing data or unequal number of subjects across groups under the ANOVA framework, but the approaches are usually pretty awkward, leading to compromised assumptions.

(3) It's quite awkward to consider covariates (continuous variables such as age, behavioral data, etc.) under the ANOVA framework.

Nowadays ANOVA can be thought of as a special case under linear mixed-effects model. However we don't even have to go that far into the linear mixed-effect modeling territory since most of the time we just need to change our perspective. Instead of focusing on the experiment design in the perspective of the ANOVA mode with factors, levels, etc., we could simply list all the tests we want to get out of the FMRI group analysis, and then run each test separately. This new perspective usually does not require us to consider factors, levels, balancedness, fancy assumptions (homoscedasticity, compound symmetry), etc., and it gives us more flexibility in terms of analysis strategy.

Suppose we have a categorical variable (factor) with three levels: A1, A2, and A3. If one subject only performed A1 and A2, and we don't have any data of A3 from this subject. It's very difficult to handle this scenario in the ANOVA framework. While a linear mixed-effects model sounds very appealing, there are some subtle issues regarding the testing statistics, especially there is some thorny point about the degrees of freedom involved in the testing statistics. However, if pair-wise comparisons or linear combinations among the three levels are of our interest, the method will be simple and straightforward: we simply run one paired or one-sample t-test a time (dropping the subject if A3 is involved since that subject does not provide any information for that specific test), easily avoiding the sticky issue of missing data. For example, for the contrast of A2 - A3, intuitively the subject with no data on A3 would not provide any useful information; and adding this subject in a model would lead to some assumptions that are hard to validate most of the time, unnecessarily complicating the modeling strategy.

Using your case as another example: there are two categorical variables, one between-subject factor (group with 2 levels), and one within-subject variable (time point with 2 levels). First of all, think about all the tests you want to run, and then for each t-test (one degree of freedom test), I would go with 3dMEMA or 3dttest.

I do realize that people would like to have F-tests for main effects and interactions as well, and such an obsession comes from the traditional textbook. I admit that an F-test provides a neat summary about the overall effect of a factor or interaction among factors. However, such a summarized result comes with cost:

(1) Sophisticated assumptions are usually involved as discussed above, and sometimes we're not really sure to what extent they are reasonable.

(2) The information from an F-test is pretty limited and actually more or less redundant. For instance, a significant main effect does not tell us where the difference is across the levels of a factor with more than two levels. Similarly a significant interaction between two factor does not reveal where and in what direction exactly such an interaction occurs. In the end we will still have to run individual t-tests to sort out the specifics and directionality. So honestly I don't really care about F-tests: why do I want those F-tests since I'll have to resort to those individual t-tests in the end anyway? More subtly there are some discrepancies in terms of significance between an F-test and the corresponding t-tests. In other words, it's not rare to see that people are puzzled by a situation where an F-test for an interaction or main effect is significant (or not significant) while the corresponding individual t-tests tell the opposite story.

Back to your situation with one between-subject variable (group with 2 levels), and one within-subject variable (time point with 2 levels), the F-tests for the two main effects can essentially be replaced with two t-tests, and are not as informative as the t-tests since the F-tests lose the directionality information. The F-test for the interaction has the same story: It basically tests against the same hypothesis as the t-test for (A1B1-A1B2)-(A2B1-A2B2) or (A1B1-A2B1)-(A1B2-A2B2), which can be obtained with 3dMEMA or 3dtest, but we can say more with the t-test than F: for example, a positive t-value shows A1B1-A1B2 > A2B1-A2B2 and A1B1-A2B1 > A1B2-A2B2.

Having said this, I'd like to point out there are two old programs in AFNI for your case along the line of batch mode fashion if you haven't been totally convinced by my arguments above: 3dLME (http://afni.nimh.nih.gov/sscc/gangc/lme.html) or GroupAna (http://afni.nimh.nih.gov/sscc/gangc/Unbalanced.html/).

Gang

Sunday, October 9, 2011

Fieldmap series.

Still not get FSL to work with our fieldmap data. Don't know what's the problem. I'll look into it later. :-(

http://www.fmrib.ox.ac.uk/fsl/fugue/feat_fieldmap.html

http://www.fmrib.ox.ac.uk/fsl/fugue/prelude.html

http://www.fmrib.ox.ac.uk/fsl/fugue/index.html

TBSS links

http://www.fmrib.ox.ac.uk/fsl/tbss/index.html

http://fsl.fmrib.ox.ac.uk/fsl/fsl4.0/randomise/index.html

http://rc.partners.org/node/241

http://www.fmrib.ox.ac.uk/Members/karla/lectures-and-teaching-material

http://www.fmrib.ox.ac.uk/fsl/feat5/detail.html#ANOVA1factor4levelsRepeatedMeasures

http://www.fmrib.ox.ac.uk/fsl/randomise/index.html

Wednesday, October 5, 2011

My morning thoughts about FSL v.s. AFNI

About the source code,

FSL had several drawbacks. It did not give you error message and often overwrite the image, which can be problematic when you run a batch script. You might did something wrong but you will not know until you visualize your data. While AFNI does much better job for output message. The codes from AFNI is long and long. FSL codes are simple to read and digest.

FSL has some special functions like TBSS, fieldmap unwrapping, and dual-regression, etc. AFNI developers are slow to catch up in this aspect. They are too proud sometime. :-)

Thus, I use both. *_* I do like those linux based software. Fast and easy to run on the cluster machine. When you have tons of subjects, you have to use batch scripts.

Tuesday, October 4, 2011

Fieldmap issue.

https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1008&L=FSL&P=R52486&1=FSL&9=A&J=on&X=571CB81652E87C1BC9&Y=yingying.wang%40cchmc.org&d=No+Match%3BMatch%3BMatches&z=4

Ben and I did not see Dr. Holland at his office and we went to talk to the Philips' technician (Dennis). He mentioned if the "CLEAR" option is on. The intensity of images will change. He is right. That might be the reason that the output of phase image scaled by 1000 from -pi to pi.

Try to make this work so it might benefit for the future study. USE fieldmap to reduce the frontal and temporal distortion.

https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1109&L=FSL&P=R42533&1=FSL&9=A&J=on&X=571CB81652E87C1BC9&Y=yingying.wang%40cchmc.org&d=No+Match%3BMatch%3BMatches&z=4

Monday, October 3, 2011

Online stopwatch!

http://tools.arantius.com/stopwatch

Sunday, October 2, 2011

AFNI v.s. FSL

http://neurohub.ecs.soton.ac.uk/index.php/Software:AFNI

DTI program: dtifit (FSL) v.s. 3dDWItoDT (AFNI)

3dDWItoDT: over 2000 lines c code - nonlinear fitting - take care the low SNR problem, FA values are in [0 1] range.

dtifit: only 551 lines c code (easy to understand) - linear fitting (straightforward) but the package itself will not take care of the low SNR problem in the dataset and generate FA values more than 1 which is physically impossible.

The following message is from dti-tk website: http://dti-tk.sourceforge.net/pmwiki/pmwiki.php?n=Documentation.Diffusivity

How to figure out the unit of diffusivity in a DTI volume?

If, for diffusion-weighting, b-value of 800 s.mm^-2 is used during the acquisition, and you input the number 800 as the b-value to your tensor reconstruction program, then the output DTI volume will have the unit of mm².s^-1.

Alternatively, you can compute the trace map of your DTI volumes. If you find that the trace values for CSF are close to 10^-2, then most likely it is in the unit of mm².s^-1.

For AFNI users: 3dDWtoDT in AFNI does NOT factor b-value into the tensor estimation (Read more here). As a result, the program essentially always assume the b-value to be equal to 1. If the actual b-value is 1,000 s.mm², then the equivalent unit of diffusivity used by 3dDWItoDT is the same as DTI-TK. If the actual b-value is 800 s.mm², then the equivalent unit of diffusivity is 1/800 mm².s^-1 instead. The multiplication factor should then be 1.25.
For Camino users: In general, it depends on how the scheme file is created. If the scheme file conforms to the default unit of Camino, the diffusivity unit should be m².s^-1.
For FSL users: If the correct b-value is used for the corresponding entries in the bval file, then the diffusivity unit should be mm².s^-1.

http://www.dailymotion.com/video/xldrc5_a-video-tutorial-on-crossing-fibres_tech#rel-page-under-1

http://www.dailymotion.com/video/xk0hwc_non-invasive-histology-and-connectivity-mapping-of-the-brain_tech#rel-page-under-2

http://www.dailymotion.com/video/xjzxat_ismrm-2011-tutorial-group-analysis-atlases_tech#rel-page-under-3

DTI brainvoyage

http://support.brainvoyager.com/documents/DWI/GSG/DTI_inBV_webse2.html

FSL source code compiling under Debian.

FSL Feeds not all tasks are passed.

I ended up installing Debian package by the NeuroDebian Team.

http://neuro.debian.net/pkgs.html

http://neuro.debian.net/pkgs/fsl.html

YEAH!

yingying@yingying-debian:~/opt/fsl_feeds$ time ./RUN all

FSL Evaluation and Example Data Suite v4.1.8

start time = Sun Oct 2 00:55:22 EDT 2011
hostname = yingying-debian
os = Linux yingying-debian 2.6.32-5-amd64 #1 SMP Fri Sep 9 20:23:16 UTC 2011 x86_64 GNU/Linux

/bin/rm -rf /home/yingying/opt/fsl_feeds/results ; mkdir /home/yingying/opt/fsl_feeds/results

Starting PRELUDE & FUGUE at Sun Oct 2 00:55:22 EDT 2011
% error = 0.0
% error = 0.0

Starting SUSAN at Sun Oct 2 00:55:25 EDT 2011
% error = 0.03

Starting SIENAX (including testing BET and FLIRT and FAST) at Sun Oct 2 00:58:37 EDT 2011
checking error on BET:
% error = 0.0
checking error on FLIRT:
% error = 0.0
checking error on FAST:
checking error on single-image binary segmentation:
% error = 0.37
checking error on partial volume images:
% error = 0.28
% error = 0.41
% error = 0.29
checking error on SIENAX volume outputs:
% error = 0.18
% error = 0.11
% error = 0.49
% error = 0.02
% error = 0.51

Starting BET2 at Sun Oct 2 01:19:05 EDT 2011
checking error on T1 brain extraction:
% error = 0.0
checking error on skull and scalp surfaces:
% error = 0.02
% error = 0.03
% error = 0.14

Starting FEAT at Sun Oct 2 01:33:20 EDT 2011
checking error on filtered functional data:
% error = 0.08
checking error on raw Z stat images:
% error = 0.28
% error = 0.17
% error = 0.23
checking error on thresholded Z stat images:
% error = 0.37
% error = 0.3
% error = 0.32
checking error on registration images:
% error = 0.0
% error = 0.0
checking error on position of largest cluster of Talairached zfstat1:
% error = 0.0
% error = 0.0
% error = 0.0
% error = 0.05
% error = 0.1
% error = 0.08

Starting MELODIC at Sun Oct 2 01:45:01 EDT 2011
% error = 0.66

Starting FIRST at Sun Oct 2 01:55:47 EDT 2011
% error = 0.67

Starting FDT (bedpost) at Sun Oct 2 02:07:29 EDT 2011
checking error on bedpost output:
% error = 0.25
% error = 0.01
% error = 0.31
% error = 0.21
% error = 0.07
% error = 0.3
% error = 0.21

Starting FNIRT at Sun Oct 2 02:16:58 EDT 2011
% error = 0.0

All tests passed

end time = Sun Oct 2 02:16:59 EDT 2011

real    81m37.291s
user    43m16.954s
sys    0m57.920s

Worked!