
We train protein language models up to 15B parameters and find that as models scale, information emerges in the representations that enables accurate atomic-resolution structure prediction.

We present scGPT, a generative pretrained transformer model for single-cell biology that enables cell type annotation, multi-batch integration, and perturbation response prediction.

We generate gene embeddings by converting NCBI gene summaries into vector representations using GPT-3.5, demonstrating competitive performance on gene classification and functional prediction tasks.

We present the AlphaFold DB, providing open access to 200 million protein structure predictions, covering nearly all catalogued proteins known to science.