GeneCompass


 Mammals, comprising billions of cells and orchestrated by a complex multi-layered gene expression regulatory system. Understanding the regulatory mechanisms is important for both comprehending human disease treatment and promoting the conservation of endangered species. However, the conventional research paradigm predominantly focuses on a limited number of model organisms, such as humans and mice, making it challenging to comprehensively decipher the intricate regulatory mechanisms across species and cell types. The advent of transformer-based pre-trained models presents an unprecedented opportunity for the advancement of this field. Recent breakthroughs in single-cell transcriptomics have generated vast datasets, encompassing billions of cells and offering rich insights into gene-gene interactions. Leveraging this invaluable resource, several research groups have endeavored to develop fundamental pre-trained models. However, these models have been constrained by limited sample sizes and have primarily focused on a single species, restricting their performance and applicability in diverse downstream tasks. In this study, we have pioneered the construction of GeneCompass, the first cross-species pre-trained foundational model, utilizing over 100 million single-cell transcriptomes from humans and mice. Our model surpasses previous approaches by effectively integrating diverse biological prior knowledge, including promoter sequences and gene co-expression networks, into the training framework. The evaluations for various downstream tasks suggest that the integration of prior knowledge has significantly enhanced the performance of the model. Notably, we have successfully leveraged the model trained on data from two species to perform downstream tasks on a third species.

 In conclusion, GeneCompass represents a pioneering cross-species pre-trained large-scale model. By incorporating cross-species information, we not only enhance the performance of downstream tasks in well-studied species like humans and mice but also unlock new realms of investigation for non-model organisms. This accomplishment sets the stage for future advancements in understanding gene regulation in non-model organisms.