Computers Now Know Gene Networks: Transfer Learning And Massive Single Cell Transcriptomes Predict Bio | Christina Theodoris
Manage episode 364836012 series 3453925
What if we had a ChatGPT for transcriptomics, what if we could predict how the gene network is changed in diseases for which we have scarce molecular data?
Today I have the immense pleasure to discuss transformers in single-cell transcriptomics to make huge predictions about gene modulation with Christina Theodoris, who today first-authored a paper in Nature entitled “Transfer learning enables predictions in network biology”. Christina has recently started her own lab at Gladstone Institutes to study gene networks with a focus on the heart and cardiovascular disease, after working at Harvard, the Broad, and Children’s.
Christina and I discuss in an entertaining and technical conversation her impressive model, GENEFORMER. We introduce foundational concepts in machine learning that enabled her work: pre-training on large scale datasets and transfer learning followed by task-specific fine tuning, to democratize fundamental knowledge; rank value encodings; self-attention; masked learning objectives and self-supervision; embeddings. Christina pooled the huge amount of RNA-seq transcriptomic data generated worldwide at the single-cell level in recent years so to make a model that knows the gene-gene expression correlations, for all the cell types in the human body, and understands their gene networks so well that it can predict how they rewire and genes change expression when you perturb single genes or combinations of them. We discuss current applications, such as in silico reprogramming and differentiation, in silico deletion of genes or modeling of dosage reduction or increase for the study of haploinsufficiencies. Christina beautifully explains how such a fine-grained network view of single-cell biology can help us identify key genes to normalize by medical intervention to restore health in disease settings. Her model is powerful to understand the genome-wide, all-tissues expressive consequences of diseases for which only a limited amount of data is available, thanks to transfer learning, it can model the effect of treatments that modulate the expression of specific genes, and it is so promising for understanding in depth rare diseases and more, including cardiovascular diseases as shown in the first report. We end by speculating on the broad applicability of her model, and the possibilities of growth, for instance by integrating several omics data, towards the dream of generating entire virtual patients starting simply from their DNA sequence. I honestly can’t make predictions about what Christina and her colleagues will do in the next few years, as their GENEFORMER is completely out of what I thought was already coming!
If you liked this episode, please consider subscribing to The Biotech Futurist on Spotify, Apple Podcast, Stitcher, Google Podcast, or your favorite platform, and leaving a positive review. The growth of this podcast depends critically on word-of-mouth. Thank you for your help. Follow The Biotech Futurist on Instagram and YouTube, and DM me if you have any curiosity. You can always download the transcript of this episode and find the links to the papers we mention on my website, lucafusarbassini.com. The jingle is by Gabriele Fusar Bassini.
[ACADEMIC] “Attention is all you need”: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Christina’s lab: https://gladstone.org/people/christina-theodoris
[ACADEMIC] Transfer learning enables predictions in network biology: https://www.nature.com/articles/s41586-023-06139-9
14 episodes