Unlocking the intricacies of human disease at the molecular level stands as one of the paramount scientific challenges of our era, holding promise for therapeutic development through precision and personalized medicine.
For a long time, prevailing theories posited that genetic variants linked to diseases would primarily be found within protein-coding genes. It turns out, most are found in the parts of our DNA that don’t encode proteins. This discovery redirected scientific focus towards unraveling the complex regulatory networks of the genome, seeking deeper insights into the elusive mechanisms steering disease and health.
Impacts of genetic variants on gene expression have been the primary focus in the field for the last decade, and the Genotype-Tissue Expression project (GTEx) is one of the major efforts, studying the relationship between genetic variation and gene expression in a cohort of 1000 donors across up to 54 tissues, revealing a treasure trove of insights in a series of data releases and publications.
Beyond gene expression
Gene expression is only one of many intermediate molecular layers between genetic variation and ultimately disease phenotypes. To capture the impact of genetic variants on these additional molecular processes beyond gene expression, the “Enhancing GTEx” project (eGTEx) was launched, seeking to capture DNA accessibility, epigenomic modifications, DNA methylation, RNA modifications, protein variation, somatic variation, and telomere length.
Prof. Manolis Kellis and his team at the MIT Computer Science and Artificial Intelligence Lab and at the Broad Institute of MIT and Harvard are leading two of these eGTEx projects, seeking to study the impact of genetic variants both on RNA modifications and on epigenomic modifications of histone marks.
The first eGTEx study by Prof. Kellis and his team was published in 2021 in Nature Genetics (Xiong, Hou, et al.) and uncovered the significant impact that disease-associated variants play on RNA modifications, and particularly on the most prevalent modification of RNA, N6-methylation of adenine (m6A). That study revealed the tissue-specific nature of RNA modifications and the RNA regulators whose binding was altered by genetic variants, and helped elucidate molecular implications of genetic variants associated with diverse diseases.
Epigenomic variation
This year, the Kellis Lab reports their second eGTEx study, also in Nature Genetics (Hou, Xiong, et al), revealing the impact of genetic variants on histone modifications, and specifically lysine 27 acetylation of histone H3 (H3K27ac), the premiere mark used for capturing active enhancers and promoters, which is also one of the most variable across tissues, and one of the most strongly-enriched epigenomic modifications in genetic variants associated with human disease, indicating important roles both in tissue-specific gene regulation and in disease susceptibility.
Prof. Kellis and his team used chromatin immunoprecipitation (ChIP-seq) for H3K27ac to capture the genomic locations marked by this epigenomic mark, across 387 samples, 256 individuals, and four human tissues of brain, heart, muscle, and lung. The sheer size of the data nearly tripled the number of enhancer datasets generated by ENCODE, Roadmap Epigenomics, and the Genomics of Gene Regulation project, providing an important resource for the study of gene regulatory variation of enhancers across tissues and individuals.
These experimental datasets were computationally analyzed to unearth over 282,000 active regulatory elements, enabling the team to study how regulatory regions vary in their activity between individuals, between sexes, across tissues, and in the context of genetic variants associated with disease.
Missing regulation
The team found that many genetic variants associated with disease, that previously did not show any impact on gene expression, were in fact leading to changes in histone acetylation, providing new hints for how they might be ultimately impacting cellular function.
“Our study expands the impact of non-coding genetic variants beyond gene expression, showing that many disease-associated variants can still impact gene-regulatory programs, even when they’re not observed to directly impact steady-state gene expression levels,” says Professor Kellis. “These variants might be impacting gene expression dynamics, or lead to condition-specific gene expression changes, or alter expression at other time points that are not captured here”, explains Kellis.
These results help address some of the recently-recognized “missing regulation” challenge, a successor of the “missing heritability” problem that was initially recognized in genetic association studies. The first issue of “missing heritability” was concerning the fact that genome-wide-significant genetic variants only explained a small fraction of the total heritability captured by genetics, and was in part resolved by recognizing that very large numbers of weak-effect variants capture a large fraction of that ‘missing’ heritability. The second issue of “missing regulation” instead highlights that a large fraction of the non-coding variants associated with diseases have not yet been found to impact gene expression, and the current study suggests that they might be directly impacting epigenomic alterations, and ultimately affect expression or disease through other indirect ways only in specific contexts.
Tissue-specific and sex-specific activity
The researchers also studied how common genetic variants, which are present in all cell types (as nearly all cells in our body inherit the same DNA code) lead to tissue-specific gene-regulatory effects. They found that some variants affect the activity of elements that are only active in some tissues, and thus show no effect in other tissues. However, they also found some genetic variants in regulatory elements that are active in multiple tissues, but only show genetic effects in some tissues, providing additional mechanisms of tissue-specific disease mechanisms in otherwise tissue-shared regulatory elements.
The researchers also identified thousands of regulatory elements that showed robust differential activity between male and female donors, indicating sex-specific effects. Some of these variants were associated with disease variation, which could help explain male-female differences in multiple diseases, including heart diseases and hypertension.
“One of the most striking signals was the highly sex-specific nature of these regulatory elements in multiple tissues, and often within disease-associated regions”, says Lei Hou, first author of the study and former postdoc at CSAIL, who recently started a faculty position at Boston University School of Medicine. “This has important implications for therapeutic development and clinical trials in male vs. female subjects, which may need to become increasingly sex-specific, an important step towards personalized and precision diagnostics and treatments”.
Disease-associated regulatory circuits
The team then sought to use their results to study how genetic variants impact human disease, which first required linking regulatory elements to their target genes, in effect tracing the wiring of gene regulation. To achieve this goal, the team developed a new framework of regulatory-elements-gene linking (called gLink, for genetic linking), that used genetic regulation of both genes and regulatory regions to guide the linking, which they found greatly improved the ability to recognize true target genes of genetic variants.
Using these links, the researchers connected disease-associated genetic variants to the regulatory regions they overlap, the genes these are linked to, and the tissues and cell types where they act. They found that genetic variants associated with psychiatric and neurologic disorders preferentially localized in regulatory elements active in the brain, and those associated with cardiometabolic disorders in those active in heart and muscle.
The researchers further grouped regulatory elements by their more precise activity patterns across hundreds of tissues, and found specific subgroups active in both brain and in immune cell types, which were particularly enriched in schizophrenia and bipolar disorder, contributing to the emerging concept of immune dysfunction involved in psychiatric diseases.
Indeed, focusing on the 54 schizophrenia-associated genetic loci that contain variants with gene-regulatory effects, most were indeed acting in brain, but a subset showed activity in muscle or heart, potentially implicating vascular functions.
“This tantalizing evidence suggests perhaps a more wide-spread role for immune and vascular functions in psychiatric illnesses, in addition to the well-recognized neuronal roles”, says co-first author Dr. Xushen Xiong, former postdoc at CSAIL and now a faculty member at Zhejiang University. “These results are humbling and point out how much remains to be discovered in the molecular basis of complex traits”.
The researchers hope that their datasets generated, the analysis results, and the scientific insights gained can help as one more stepping stone in the systematic understanding of genetic variants in gene regulation and in human disease.
“We hope that our epigenomic and epitranscriptomic eGTEx datasets, and the broader GTEx and eGTEx resources, will form an important stepping stone towards a systematic understanding of genetic variants and their impact on gene regulation and disease”, concludes Kellis. “Perhaps even more importantly than the small number of compelling circuits that we currently understand, these systematic views of human disease can help point us to new biological and biomedical directions, by focusing on the many surprising findings, which often on the surface, appear to challenge our current understanding of human disease”.
The paper includes other authors within Kellis' laboratory: Lei Hou, Xushen Xiong, Yongjin Park, Carles Boix, Benjamin James, Na Sun, Liang He, Aman Patel, and Zhizhuo Zhang. It is a collaborative work with a team of scentists from Broad Institute. The group’s work was supported, in part, by grants from the National Institutes of Health (NIH). Their paper was published in Nature Genetics in September.