Tuesday, February 18, 2014

Axon guidance and SNPs

This is warning to those doing pathway type analysis on SNPs extracted from Illumina 660 type chips. Specifically when looking at brain phenotypes.

Recently, I took a quick and fast approach of grabbing a bunch of genes from some top (but non significant) SNP's from an analysis of fMRI data. I didn't expect any GO groups to be significant, so I was shocked to see axon guidance on top with a super significant p-value (10-17 after correction). There's a bunch of other GO groups too that come out on top too (see below). We were doing brain stuff so we liked the GO groups but were suspicious for a number of reasons.

Here's a quote that sums up the problem:
"Such an analysis of gene set enrichment is based on the assumptions that all genes are sampled independently from each other with the same probability. These assumptions are violated with data from GWA studies as (i) longer genes usually have more SNPs resulting in a higher probability of being sampled and (ii) overlapping genes are sampled in clusters (Holmans et al., 2009)."

That text is from:
Gowinda: unbiased analysis of gene set enrichment for genome-wide association studies
Robert Kofler and Christian Schlötterer

One quick test I did was to run another gene ontology analysis where the genes are sorted according do how many SNP's on the genotyping chip. That confirms the problem. Here's some of the top groups from a quick GOrilla analysis


GO Group FDR q-value
cell adhesion 3.43E-017
biological adhesion 2.04E-017
neuron projection guidance 1.39E-017
axon guidance 1.04E-017
single-organism process 1.84E-015
synaptic transmission 3.16E-014
single-organism cellular process 1.20E-013
cellular component movement 1.40E-013
single-multicellular organism process 2.30E-011
ion transport 2.51E-011

This shows a pretty clear bias and I'm guessing it's in part biological. I'd reckon that lots of variation in neuronal guidance genes allows a diversity of brain wiring.

Next step was to hookup Gowinda for the human data and redo the analysis. The result was no significant GO groups.

Several files are needed Gowinda, I put them online if anyone wants to do a similar analysis with human data.

We noticed this and fixed it, but I wonder how many other papers might have been tricked by this. One that comes to mind is this paper:
A genomic pathway approach to a complex disease: axon guidance and Parkinson disease
Timothy G Lesnick, Spiridon Papapetropoulos, Deborah C Mash, Jarlath Ffrench-Mullen, Lina Shehadeh, Mariza de Andrade, John R Henley,Walter A Rocca, J. Eric Ahlskog, Demetrius M Maraganore

with this follow-up (among others):
Neither Replication nor Simulation Supports a Role for the Axon Guidance Pathway in the Genetics of Parkinson's Disease
Yonghong Li, Charles Rowland, Georgia Xiromerisiou, Robert J. Lagier, Steven J. Schrodi, Efthimios Dradiotis, David Ross, Nam Bui, Joseph Catanese, Konstantinos Aggelakis, Andrew Grupe, Georgios Hadjigeorgiou

I can't say exactly what might explain the differences between those two as they did a more complex analysis than just looking for pathway enrichment for certain SNPs. The first paper does have very significant p-values (10-51) which can be a red flag. It could be that a small enrichment for brain genes, for the phenotype could be exaggerated by the huge bias in the chips. I didn't spend much time on this though, I just wanted to post it online incase anyone else runs into it.

Update: I noticed a similar problem when running GO analysis on Illumina 450k methylation data. In that case the number of CpG sites per gene is not even across GO groups. I ended up doing an empirical type analysis which seemed to work - the super low p-values disappeared.

No comments:

Post a Comment