Back to Team
Jorge Botas

Jorge Botas

PhD Student

PhD Student

Jorge Botas is a PhD student in computational biology whose research spans comparative genomics, functional annotation, and machine learning for large-scale biological data, with a recent focus on single-cell foundation models and neurodegenerative disease. His work combines algorithmic development, large-scale data integration, and open scientific tooling to extract biological insight from complex and heterogeneous genomic datasets.


His early research made substantial contributions to comparative and evolutionary genomics, particularly through the development and extension of widely used community resources. He has contributed to eggNOG, one of the most comprehensive orthology and functional annotation databases, supporting comparative genomics across tens of thousands of organisms. In parallel, his work on the functional and evolutionary characterization of unknown genes from uncultivated taxa helped illuminate the biological relevance of the microbial “dark matter,” linking evolutionary conservation to functional potential in previously uncharacterized genes.


Jorge has also developed interactive platforms and visualization tools for genomic and phylogenomic analysis, including GeCoViz and PhyloCloud, which enable intuitive exploration of genomic context, gene neighborhoods, and large phylogenetic datasets. These tools emphasize accessibility and interpretability, lowering the barrier for researchers to interrogate complex evolutionary relationships and functional patterns at scale.


More recently, his research has shifted toward machine learning–driven genomics, with applications ranging from Mendelian disease discovery to single-cell biology. This trajectory culminates in his current work on single-cell foundation models, including Lemur, a model tailored to Drosophila melanogaster that enables fine-tuning-free cell-type annotation, batch correction, and systematic in silico perturbation. Through applications to fly models of Alzheimer’s and Parkinson’s disease, his work integrates representation learning with disease modeling and cross-species validation to identify conserved cellular mechanisms of neurodegeneration.


Across domains, Jorge’s research is unified by a focus on scalable, reusable computational frameworks that connect evolutionary context, genomic variation, and cellular state. His work reflects a progression from global comparative genomics to cell-resolved disease modeling, positioning machine learning as a central tool for biological discovery rather than a purely predictive endpoint.