Upgrade 2021: MEI LAB Speakers
September 21, 2021 // Upgrade 2021: MEI LAB Speakers
Using Digital Twin Technology to Find the Needles in the Alzheimer’s Research Haystack
Cory Funk, Senior Research Scientist | Institute for Systems Biology
Transcript of the presentation Finding the Needles in the Haystack: Utilizing a Causal Framework to Reduce the Hypothesis Space and Relate the Genome to the Phenome, given at the NTT Upgrade 2021 Research Summit, September 21, 2021.
Cory Funk: You can read the title here. It’s also related to digital twins, but it’s for kind of a different purpose. So in my case, I’m an Alzheimer’s researcher. And what we’re trying to do with digital twins is to, again, sort of reduce the hypothesis space, by means of causal reasoning. And, in so we want to connect the phenome to the genome. So really understand individuals and how, you know, across the population, somebody would react or understand their, their version of the disease, their version of Alzheimer’s. Cause there’s a lot of variability.
But to start, I want to be a little bit of a provocateur, especially saying this year at this conference. So there’s a book by Judea Pearl, and it’s called “The Book of Why” and it’s really about causal reasoning. And I’m just going to read you a short quote from that. It says: “We live in an era that presumes big data to be the solution to all our problems. Courses in data science are proliferating in our universities and jobs for data scientists are lucrative in the companies that participate in the data economy. But I hope with this book to convince you that the data are profoundly dumb.”
So an example of how the data is kind of dumb, comes, I think, from the sequencing of the human genome. So we now live in the post genomic era, and this is a quote from New York Times article over 10 years ago. And it says “a decade later gene map yields very few cures.” So this is, this is sort of the, the failed promise of the genome. We thought that we’d have had all the answers once we sequenced the genome. And clearly even a decade after the statement, is also not the case. There seem to be very few cures that have resulted from sequencing of the human genome.
So there may be some statisticians here, people with statistical training. There’s a phenomenon called Simpson’s paradox, and it’s not after, it’s not termed after Homer Simpson, but I think it is appropriate here to invoke Homer. So in this case, you can see, you have sort of X appears to be a function of Y, there appears to be a relationship between the data. And this is sort of a toy example, but when you actually have all of the, you know all of the different confounders that are within the data, you get very different patterns that are emerging from that data. And this is Simpson’s paradox, and this is a well-known sort of thing that happens. And it really is, you need to understand your data. You need to understand the confounders, otherwise you run the risk of confounding yourself.
Okay, so I mentioned I’m an Alzheimer’s researcher. So one of the biggest debates around Alzheimer’s right now is the amyloid hypothesis. And so there’s sort of a question of whether we even have the right causal framework to understand Alzheimer’s. So historically Alzheimer’s was, is named after Alois Alzheimer, who was looking at a 50 year old woman, the brain of a 50 year old woman, and found these plaques in her. And so 50 years is pretty young. And so arguably she was probably an early onset case of Alzheimer’s. So early onset typically happens before the age of 60. And in the case of early onset, it typically is a Mendelian inheritance caused by a mutation in the amyloid precursor protein, APP, or other genes, presenilin one or presenilin two.
So nearly all of the mouse models that attempt to, that are implemented to understand Alzheimer’s have these same mutations. So it’s more or less a sort of a Mendelian type of mouse model. And of course these mouse models are really the primary models in which drugs or hypothesis, are tested in them. And so, we can do things where we can sort of, you know, reduce amyloid burden within a mouse. But the question is, how relevant is this for late onset? So that’s the second form of Alzheimer’s and this is the one that’s far more prevalent. And it typically happens in individuals over 60 years old. And the two largest risk factors for Alzheimer’s are, age, and then APOE genotype. So there’s three different genotypes, APOE two, three, and four. I don’t think APOE one ever existed, but those are the main ones. APOE four is the one that is most commonly associated with Alzheimer’s. APOE two is protective.
In addition to APOE there are over 40 genetic loci that are implicated by genome-wide association studies. So the genetics are very complex and I’m going to get into that a little bit later.
The other thing to know about Alzheimer’s is that with this complex genetics and with this sort of causal framework that is potentially incorrect, what we do know is that we have many, many failed clinical drug trials. So there’s currently only two classes of drugs that are given to Alzheimer’s patients, and they’re really intended to help synaptic plasticity. And they don’t really change the trajectory of the disease. In fact, typically 18 months after you’ve taken them, you returned back to what you would have been, had you never taken them before. And as I mentioned, there’s over 400 failed clinical drug trials. And there’s really not any drugs that, or treatments, that are available right now for Alzheimer’s.
This has not been through a lack of effort. So there, there have been all sorts of efforts to try and, you know, find treatments for Alzheimer’s. As I mentioned, there’s two classes of drugs, cholinesterase inhibitors, and NDMA antagonists, that are utilized. But they only really improve quality of life for a very short period of time. Now, typically drug trials are only done for two years. Drug trials are very expensive. They can cost upwards of $200 million per drug trial. And so you’re talking about a disease where, it lasts many years, and this sort of way in which we evaluate drug efficacy is only over two years. And so that’s a real problem.
And also in the news, there’s been a drug that was recently approved, it’s called Aduhelm, and it is a monoclonal antibody directed against amyloid fibrils. And it’s been in the news for a lot of bad reasons. So when it was first approved by the FDA, there were three external advisers to the FDA that resigned in protest. So one of the main side effects of Aduhelm are brain bleeds. They conveniently call them ARIA, so much nicer word than a brain bleed. But this occurs in over 40% of patients. And on top of that, the efficacy of Aduhelm is really sort of questionable. So they, they showed a very marginal effect. But as the FDA sort of said that, you know, they approved this drug, not because they thought it was really effective, but because there really is nothing else out there.
So another thing about AD is that there are no clinically relevant subtypes. There’s no stratification, but there are a lot of comorbidities and these are with cardiac disease, and with diabetes.
So the only thing that we do know that is somewhat effective in terms of a treatment for Alzheimer’s, is more preventative, it’s lifestyle interventions. So diet, exercise, regular sleep. Those are the types of things that have been shown to, to matter in terms of the trajectory of the disease.
Okay, so as I mentioned, it’s a disease that begins upwards of 25 years before you have any type of symptoms. And there’s a lot of clinical variability within diagnosis. So it used to be that you couldn’t actually diagnose somebody with Alzheimer’s. It had to be called probable Alzheimer’s. It had to be post-mortem autopsy and looking at the brain. And the diagnosis even today is somewhat subjective. So you can get different diagnoses from different neurologists. But it’s typically done based on, you know, increased amyloid plaques or in the CSF and by imaging, as well as phosphor-taus. So this is, phosphor-tau happens in sort of all, sort of, traumatic brain injury and stress, that kind of a thing. And then of course, cognitive testing.
So the other curious thing, going back to the amyloid hypothesis, is that there are also individuals out there who have a head full of amyloid, but no cognitive decline. And they do not appear to have really lost any neurons at that point. So again, the question of why do you have individuals that have a head full of amyloid, if amyloid is the cause of Alzheimer’s?
So, another interesting thing about Alzheimer’s is the heritability. So twin studies put the heritability at about 60%, so it seems to be highly heritable. Interestingly, common variant heritability, is really only a fraction of that, that 60%. So it’s only two and a half to 10%. So what this is sort of implying is that, there are a lot of genetics involved, but they’re not necessarily common variants. They’re either a combination of rare and common variants, or rare variants. And I’ll get into that a little more.
The sort of the picture here. This is a, an artist’s self-portrait over time, and you can sort of see that the change in sort of his cognitive ability and how that manifests in his artwork, which I think is really interesting and also really challenging if you’ve ever had a family member who’s had Alzheimer’s. There is a, there’s quite an emotional burden and an economic burden in terms of caring for individuals with Alzheimer’s. It can be very difficult.
So getting to the sort of economic challenges of Alzheimer’s, so again, I mentioned that the risk factors being age and APOE, and the long prodromal period, upwards of 25 years. But there is a huge variability in those individuals, even within a particular APOE genotype.
[9:37 mark] So as you can see here, you have the different genotypes. So you have two copies of every gene. So APOE two, again, being protective or being these up here, and then three is sort of these in the middle, and then you’ve got four way down here, four is the really bad one. And you can see the sort of age of onset and the percentage of individuals with that. That’s quite a change, right? So you can see that curve. So again, trying to sort of anticipate who’s going to develop Alzheimer’s, even if you know their genotype, there’s a lot of variability.
And so speaking to this again, sort of the, the cost of care. So, Bill Gates’ father who passed away, I think last year, or maybe this year, he was diagnosed with Alzheimer’s and, and Bill Gates has sort of made a foray into Alzheimer’s research. Many people thought it was primarily based on his father, and I’m sure that contributed. But he actually has a blog post where he said that what really motivated him was he was looking at the cost of Alzheimer’s, the lack of any type of treatment. And the fact that it was probably going to consume Medicare and Medicaid by, I think it was like 2040. So there really is a huge economic burden with Alzheimer’s.
Similarly there’s the World Health Organization is anticipating a 40% increase in dementia just by 2030. And this is again, huge. So this is going to be a major problem. I think John had a slide the other day that had, you know, more cause of death, biggest cause of death. And I think Alzheimer’s was second, right behind cardiac issues.
The other thing that I think is sort of, maybe not well-connected by people outside the field, is COVID. So COVID has been shown to have an impact on a lot of individuals in terms of dementia. And, you know, we’re only a couple of years into the pandemic and we really have no idea what that is going to mean for people with Alzheimer’s or people with dementia down the road. This could drastically increase the amount of dementia that we see.
Okay, so sort of the concept of digital twins and how we’re planning to use this to look at Alzheimer’s. So most everything out there, we have are sort of correlative studies. So like genome association studies. And the challenge with that is we have phenotypes, we have Alzheimer’s, we can measure amyloid burden and phosphor-tau and such. And we can compare that to the omics that we gather. So genomics primarily, but other rich omic sets where we have genomics and proteomics or transcriptomics. The challenge is it being correlative, it really fails on an individual level. So we can make generalizations about them, but for a single person, it’s really hard to understand mechanistically what’s going on. And that’s true because we don’t really have a causal framework. And without that causal framework, with all these correlations, we can sort of learn, but we’re learning to predict and not really understand.
And the challenge, again, go back to Simpson’s paradox, is there are so many confounders. And so being able to sort out those confounders is really difficult. But there is a way to potentially bridge this gap and that is through functional processes. So instead of trying to do correlations between the phenotype and the omics, what you can do is you can create functional processes that model specific pathways within the brain and those can then be sort of an intermediate. And with that, you can use a digital twin. So this is then connecting those phenotypic outputs that we care about in Alzheimer’s with these functional processes. So you’re breaking down the problem.
Once you have that, you then are in a position where you can take those functional processes and presumably you’ve removed some of those confounders, and you can then do either correlation and or mechanistic studies between those functional processes and the genomics. And this is sort of the basis of how we could potentially personalize medicine, if we can bridge this gap and we can understand those genomics and how they relate to those functional processes.
Okay, getting into the genetics again. So this is an example of an output of sort of a GWAS, genome-wide association study. They are typically, and this one was, is sort of the largest to date. It was done with over a million people. There’s a new GWAS paper probably coming out at least every year these days. And they’re always identifying new genes. And this is sort of a map of all the chromosomes. So you can see on the X axis, which chromosome number. And then at various positions, you see these loci that are implicated within the disease. This one over here is APOE. Now you can see is by far the strongest one. And that is always the case with almost any other GWAS, That’s sort of almost like your validation that you did a GWAS correctly, is you get any signal for APOE. [13:55 mark.]
All of these loci, the vast majority of the loci, are in non-coding regions of the genome. So they typically are not within the genes themselves, they’re within the regulatory regions that fall outside of the coding region of the genes. So these are the regulatory regions or the areas that sort of decide whether a gene gets turned off or on by transcription factors or by the chromatin state. And I think this is really important because what this is telling us, the genetics are telling us is that the disease, the signal that we’re seeing, are in these regulatory regions, which are akin to sort of feedback loops. How do you modulate a system? By turning genes on and off. And so the regulation is really critical, I think, for understanding the disease.
So another key thing about the genetic hits that we’re seeing in this as they’re falling within gene regions that are typically expressed in microglial cells. Microglial cells are sort of the immune cells of the brain. They’re similar to monocytes and macrophages, but they have a lot of additional responsibilities in the brain related to metabolism. And I’ll sort of get into that a little later.
To mention APOE is the strongest signal. Sometimes they even exclude it because it’s so strong because it sort of masks other signals. There’s also instances where variants, like TREM2, have actually a stronger effect than APOE, but they’re very rare.
Okay, so I mentioned APOE. So another thing that’s important about understanding Alzheimer’s is context and the next several slides I get into are all about context. So in the case of APOE, what’s interesting about this is again, the context with respect to ethnicity. So as you can see here in Caucasians, you have sort of an odds ratio, an odds ratio is more or less the association of APOE4 allele, in this case with the higher bar, with the prevalence of the disease.
So you see here that the scales are different across these ones. This is, you know, between 10 and 15. So it’s about 12. In African-Americans, it’s lower. In Hispanics having an E4 allele, you’re really not at any higher risk than any other allele. If you look at Japanese population, I believe this extends to the Asian population in general, that’s almost twice what it is, or even more, for either two alleles of E4, or even just one. So context really matters. Your ethnic background matters in terms of how APOE is functioning within your brain.
So the basis for that, in that case, that’s a coding variant. That is in the gene, the coding region of APOE4. I mentioned to you that regulatory regions of the genome, where most of the loci are. Now what that really comes down to are individual base differences across individuals. So each of us has on average about 3 million different bases that differ between any one of us. And these are, again, those regulatory regions. So what I’m showing here is a paper that was looking at monocytes, very similar to microglial cells. And we’re looking at individual differences at a particular base. So these are called eQTLs, or expression quantitative trait loci. Basically the idea is if you have one particular eQTL, you have a particular variant there, then it’s going to either increase or decrease the expression of a gene. But the context really matters here. So in what I’m showing is that depending on the condition – so in this case, if you treat with a different compound, a lipopolysaccharide or interferon – the trend for how that gene is regulated, differs. And even with the same treatment, two hours versus 24 hours, you can see a very big difference in terms of how the gene is responding. So the context really matters and this is important.
So we’re trying to understand the genome. We’re trying to understand how these variants matter. We’re trying to understand, you know, what genes are implicated. Well, another key point is that with a GWAS how you stratify your GWAS or how you select your individuals also really matters. So in this case this is a colleague of mine, Liz Blue, who did a GWAS, of 11,000 post-menopausal women, only women, who had undergone hormone replacement therapy and had a history of heart disease. And this, she just asked them if they had a family history of dementia. And that was the way that she organized the GWAS. APOE was the top hit. But the eight other genes that she found, only one of them had ever been implicated in dementia before. So very different stratification on the front end gives you very different results of what genes come out of the GWAS.
This is also true in mice, so Catherine Kaczorowski, a colleague of mine works, at Jackson Laboratory. She showed the same thing where again, taking in this case, the APP mutation in mice and putting it in very different mouse backgrounds. What she was able to show is that when you do that, there’s high variability in terms of the amount of amyloid and other sort of downstream pathologies that we care about. So again, even in mice, which typically are inbred, if you do it in an outbred strain like this, then you get the same kind of variability that you see in a human population.
[19:02 mark] So here’s a summary of why context matters. You know, we’re trying to understand Alzheimer’s and the GWAS will identify different loci that are implicated, but of course there’s confounders. So how you organize the GWAS matters. And then of course the eQTLs matter in terms of what type of background. All of these matter, the context of all of these things matter and it matters in humans and in mice.
And so I sort of pose the question, now that you have some, a little bit more of understanding, of the complexity of this: Is it any wonder that we haven’t really solved or come up with any treatments for Alzheimer’s, which is really a disease of the most complex structure in the known universe. It’s incredibly complex. And so it’s no surprise that we’ve fallen short, when we’re not able to really appreciate and model this complexity.
So how would we model this complexity? So there’s a song by The National, called Fake Empire, and it has a line in there that I really love. And I think it’s applicable not just for Alzheimer’s, but in life in general. “Let’s not try to figure out everything at once.” And that’s really what we want to do with a model. We don’t want to try and figure everything out at once. We want to break it down into components. So for brain health model, what we would want to do is we really want to start with healthy. We want to understand homeostasis. If we can model homeostasis, then we can understand what goes wrong.
The next thing that I think is really important is the right level of abstraction. So we need to understand, similar to like those functional processes that I was talking about, you’ve got to break down the system into components and understand those components. And then you’ve got to be able to connect those physiological end points that you care about with a disease or output things you can measure with those functional processes. And then you have to really be able to emulate the biology with causal reasoning. And you have to understand those positive and negative feedback loops. You’ve got to get those nonlinearities and model them appropriately. And of course you have to get them all to play well with each other. You have to do cross coupling across all of those different subsystems.
So, and what I would like to add here is that in the field of Alzheimer’s, there’s a tremendous amount of research going on, but the challenge with that is that it’s all very fragmented. And there’s really very few, if any, that are trying to integrate all of this causal reasoning or these nuggets of causal reasoning that we do understand into a full model. And that’s really what EmbodyBio is trying to do. And EmbodyBio has done that and created a model. And what they’ve done is they’ve broken it up into 13 different subsystems. They based it on over 500 references. So that’s really scouring papers for the biology and trying to capture that biology within their model.
And a lot of these are very quantitative. And so, as you can imagine, you want quantitative data, so you can match those up with those phenotypic end points that you care about. And then from that, you can create a digital population.
So now that we have the ability to sort of model this, we have a digital population. What can we learn about the biology from the digital population? And this is sort of an intermediate step before you can do sort of the personalized medicine and treat individuals.
So what are the key homeostatic roles within the model and what are risk factors that might contribute to a disease that, again, happens over 25 years. You’ve got to be able to model basically the entire lifetime of an individual if you want to understand Alzheimer’s.
So an example of this I’m going to offer you is again, centered on APOE in cholesterol trafficking, APOE’s function is actually a cholesterol transporter. So, and I mentioned microglial cells before. Microglial cells have a metabolic responsibility in the brain where they take in myelin, sort of normal myelin turnover, as well as when cells die they handle the debris from those cells.
And so there is an influx of cholesterol between astrocytes, which are sort of some metabolic support cells for neurons, and microglia. And there’s really an interface between APOE and the LDLR receptor. And so a question that we can now ask with this is, with the digital trends, that now we have these subsystems that we just want to understand cholesterol transport, is there a relationship between APOE and LDLR? And could this also happen to be related to what we observe in terms of ethnicity, in differences in ethnicity with respect to APOE?
Okay, so this is sort of plot on the X axis is actually the genome position. So we’re looking at the LDLR receptor and these two lines here represent where the coding region is. And then this is the non-coding region or the regulatory region. [23:18 mark.] These values here are eQTLs, and this is a significance threshold. So you can see there’s several eQTLs, that regulate expression level of LDLR. We then correlated these with CERAD score, which is sort of a combined metric of amyloid, phosphor-tau, and cognition. And we found a relationship between the severity of the disease, be it positive or negative, with the presence of certain eQTLs within the LDLR receptor. So there is a relationship here.
And what’s interesting is that then if you take those same eQTLs and ask the question of like, what’s their frequency within a given ethnicity, you can see differences. And so we see differences here with east Asians relative to Europeans, African-Americans and Latino. And this somewhat looks like that story in terms of APOE association. This could potentially account for those differences that we observe.
So let’s flip it on its head, why wasn’t LDLR ever found within a GWAS. And this is again, I think due to the limitations of GWAS where primarily the largest GWAS studies out there are only done on Europeans. And then the other thing too, is the complexity of the LDLR receptor. So you saw, I showed that there’s both positive and negative regulators in terms of eQTLs. So the story is more complex and your ability to tease that out with a GWAS I think is somewhat limited.
Okay, so we’re now in a position with a model, the embodied by a model that has all of these, these features and these functional processes, the 13 functional processes. So what do we do next? So I showed you the example of the LDLR, but really we would want to do that for all genes. And so we want to be able to map all genes into any one or more of those functional processes. And then we want to relate those to those phenotypic endpoints so that we can understand how some of the genetics might contribute to the differences that we see in those functional processes. And of course, we would want to do this with digital twins.
At that point, you’d be able to do experiments that are akin to what were called Mendelian randomization experiments, where you’ve removed the confounders, because Mendelian randomization requires that there are no confounders to be done. In theory, you could then do this, and then you could also take, and we’ve done some of this already, there’s already some wonderfully rich data out there that is longitudinal and multi omic. And so if you can put that data into your models, you can then understand disease trajectory and patient variability, especially if you have their genome, you’re able to relate those differences they see to their genome.
So we’re now in a position where we can with this capability, with this EmbodyBio brain health model, using digital twins, we can actually sort of iterate on our understanding of the biology in a way that can help us personalize treatments or potential treatments and therapeutics for Alzheimer’s. We’re now with these functional processes, we have functions that are looking for genes. So that’s a way we can sort of envision how we would personalize this for people. We, now we can, we can sequence your genome. We can understand how that gene functions within the context of your genome specifically. And we can model that within digital twins. And this would be an iterative process where as we learn that in individuals, we can incorporate that understanding into the model and improve the model.
So with that, I’d just like to thank my collaborators, there’s a number of them. I work at the Institute for Systems Biology. Many people are wonderful to work with there as well as, as Tom Paterson and Jennifer Rohrs who are here with us at this conference. And I thank you for your attention.
Cory Funk
Senior Research Scientist | Institute for Systems Biology
Dr. Funk is a Senior Research Scientist at The Institute for Systems Biology in Seattle, Washington. Dr. Funk received his PhD from the University of Illinois Urbana-Champaign in Cell and Developmental Biology where he trained as an experimentalist and studied the role of the estrogen receptor in gene transcription in breast cancer. He joined Nathan Price’ group as a postdoc, moving to Seattle, turning his focus to transcriptional regulation in glioblastoma and other brain tumors. Dr. Funk transitioned to computational biology, working within the NIH Accelerating Medicines Partnership in Alzheimer’s Disease (AMP-AD) beginning in 2014, with a focus on the role of innate immunity and metabolism in AD. Building on his interest in transcriptional regulation of the estrogen receptor, Dr. Funk has utilized genome-scale transcriptional models from post mortem brain samples to help identify key drivers of transcription in AD. Dr. Funk also is investigating the putative role of herpes virus as a contributor to AD pathophysiology. Dr. Funk currently works with Dr. Leroy Hood, along with many other collaborators, in efforts to integrate longitudinal -omics data sets for the purpose of understanding AD etiology.
MORE videos from NTT's upgrade summit, september 2021
- Jon Peterson: Demonstrating the Promise of Bio Digital Twins in Health and Disease
- Bernhard Wolfrum: Transformable 3D Neuroelectronic Interfaces
- David Gracias: Smart Microtechnologies for Human Interfaces
- Kenji Sunagawa: Technologies Focusing on Unmet Needs are Vital to the Sustainable Future of Medicine
- Jon Peterson: Bio Digital Twins in Health and Disease
- Daniel Burkhoff: Development and Validation of a Hemodynamic Digital Twin for Intensive Care Decision Support