When you’re trying to understand which diseases or physical traits you’re predisposed to, the answers are sprinkled across your DNA. One powerful method for decoding this genetic forecast is polygenic scores, which give patients estimates of their risk for a condition and the likelihood of having physical characteristics (phenotypes, like being tall). Researchers seek to improve the accuracy of these cumulative predictions to account for most of the known genetic contributions.
To that end, MIT CSAIL and University of Tokyo researchers have developed GenoBoost, a flexible framework that efficiently calculates your polygenic scores using individual-level genetic data. Their model can account for two types of genetic effects simultaneously for the first time: additive, where the combined effect of genetic variants from both parents adds up to influence a specific trait, and non-additive genetic dominance, which is a variation resulting from the interaction between two copies of the same allele, such as having blood group O when both of your parents have blood group A. Recently published in a Nature Communications paper, GenoBoost presents a more integrated approach for calculating polygenic scores and is particularly accurate with immune-related disorders.
“Genetics is not destiny,” says co-senior author Manolis Kellis, MIT professor of computer science, CSAIL principal investigator, and member of the Broad Institute of MIT and Harvard. “However, it gives us some sense of the probability of different events in your lifetime. The question is, how much can I predict about my risk of disease or chances of having any particular phenotype? Our paper presents a more rigorous polygenic score prediction, going beyond simply adding up alleles by also capturing non-additive effects.”
Using genetic and disease-related information from 338,138 de-identified participants in the UK Biobank resource, the researchers evaluated GenoBoost’s capabilities in analyzing more than one million genetic variants across hundreds of thousands of individuals for genetic prediction. Compared to seven state-of-the-art polygenic score methods and applied to twelve disease outcomes, the new approach ranked the best in accuracy for multiple traits, including rheumatoid arthritis, psoriasis, gout, and inflammatory bowel disease. It placed second for Alzheimer’s, dementia, and asthma.
In particular, GenoBoost measured up well against comparable polygenic score modeling frameworks like “snpnet” and “snpboost,” which also use individual-level data to assess participants’ genetic predispositions. All three approaches apply machine-learning techniques directly to individual-level data without relying on summary-level information. The difference, though, is that GenoBoost uniquely incorporates non-additive genetic dominance effects, suggesting that the approach can provide nuanced analysis.
A computationally efficient approach
Loading all genetic and medical information for every individual can be computationally expensive, so researchers previously used summary-level data to make predictions. Still, that abridged approach leaves out potentially valuable genetic information. To efficiently navigate the large-scale data within individual medical genetic records, GenoBoost finds the most informative predictors through statistical boosting, a machine-learning method that combines multiple simple models to create a stronger, more accurate one. The weak predictors within the GenoBoost algorithm consider only one genetic variant, and then they are aggregated together to form a more accurate and complete prediction.
“It's like playing with building blocks,” says Yosuke Tanigawa, a postdoctoral researcher at MIT CSAIL and co-lead and co-corresponding author of the study. “Thanks to the modular nature of the boosting technique, our approach can combine additive and non-additive genetic dominance effects for the first time in polygenic score research.”
Last year, Tanigawa and Kellis of MIT's Computational Biology Lab developed an inclusive polygenic score model that can analyze the individual-level data of diverse populations. To provide nuanced analysis for everyone, they have begun to enhance that work with insights learned from GenoBoost. The team’s upcoming paper will present a method that will consider the additive and non-additive genetic effects of people across the continuum of genetic ancestry.
Tanigawa also notes that he and his colleagues intend to apply their work to more disease outcomes, the onset of disease symptoms, and medication treatment responses. They may eventually integrate individual-level data and summary-level information to build an even more accurate and efficient polygenic score model.
For now, GenoBoost presents an excellent step forward in predicting the genetic risk for different disease outcomes. The flexible polygenic score model adds another layer of prediction tailored to each individual’s genetics, applying state-of-the-art machine learning approaches to assess their health.
Three additional researchers are credited on the paper, each from the University of Tokyo: co-lead contributor Rikifumi Ohta, Yuta Suzuki, and professor and co-senior author Shinichi Morishita. This research was conducted using the UK Biobank Resource under Application Number 48405. This collaborative project was funded by Grant-in-Aid for JSPS Fellows and Japan Agency for Medical Research and Development. It was supported, in part, by National Institutes of Health grants.