Association tests for whole-genome sequencing data

Integrative sequence-based association tests

Description

Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner. The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data.

Reference
Download

Association tests for whole-genome sequencing data

Description

Continuous advances in massively parallel sequencing technologies make large whole-genome sequencing studies increasingly feasible. The analysis such data is challenging due to the large number of rare variants in noncoding regions of the genome, our limited understanding of their functional effects, and the lack of natural units for testing. We propose a scan statistic framework, GenoScan, to simultaneously detect the existence, and estimate the locations of the association signal at genome-wide scale. Additionally, GenoScan can analytically estimate the significance threshold for a whole-genome scan while accounting for the correlation structure among the test statistics; utilize summary statistics for a meta-analysis; incorporate functional annotations for enhanced discoveries in noncoding regions; and enable enrichment analyses using genome-wide summary statistics.

Reference
Download

Functional annotation of genetic variants

Eigen

Description

Eigen is a spectral approach to the functional annotation of genetic variants in coding and noncoding regions. Eigen makes use of a variety of functional annotations in both coding and noncoding regions (such as protein function scores, evolutionary conservation scores, and epigenetic annotations from ENCODE and Roadmap Epigenomics projects), and combines them into one single measure of functional importance. Eigen is an unsupervised approach, and, unlike many existing methods, is not based on any labelled training data. Eigen produces estimates of predictive accuracy for each functional annotation score, and subsequently uses these estimates of accuracy to derive the aggregate functional score for variants of interest as a weighted linear combination of individual annotations.

Reference
Download

FUN-LDA

Description

FUN-LDA is based on a Latent Dirichlet Allocation (LDA) model for predicting functional effects of non-coding genetic variants in a cell type and tissue specific way by integrating diverse epigenetic annotations for specific cell types and tissues from large scale genomics projects such as ENCODE and Roadmap Epigenomics. Using this unsupervised approach we predict tissue-specific functional effects for every position in the human genome for 127 tissues and cell types in ENCODE and Roadmap Epigenomics.

Reference
Download and web server

GenoNet

Description

Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. We propose here a semi-supervised approach, GenoNet, to jointly utilize experimentally confirmed regulatory variants (labeled variants), millions of unlabeled variants genome-wide, and more than a thousand cell/tissue type specific epigenetic annotations to predict functional consequences of non-coding variants.

Reference
Download and web server

Contact

Please e-mail ii2135@cumc.columbia.edu with any questions or comments about this website.