Protein folding has relied on increasingly complex, domain-specific architectures since AlphaFold2’s breakthrough. But researchers at Apple (!) have asked a fundamental question: are these architectural designs actually necessary for building high-performance models?SimpleFold represents a radical departure from conventional protein folding approaches. Instead of using computationally expensive modules like triangular updates and explicit pair representations, the model treats protein folding like text-to-image generation. The amino acid sequence acts as a “text prompt” to a generative model that outputs complete 3D atomic coordinates.This shift from deterministic reconstruction to generative modeling opens new possibilities. Rather than predicting a single protein structure, SimpleFold naturally generates ensembles of conformations that capture the uncertainty inherent in protein folding – much like how AlphaFlow approaches ensemble generation.Example predictions of SimpleFold on protein targets with ground truth in light aqua and predictions in deep teal, plus performance scaling from 100M to 3B parameters and inference times on consumer hardware.The SimpleFold Method: Flow-Matching And Protein StructureFlow-matching provides the foundation for SimpleFold’s approach. This generative technique creates a time-dependent process that transforms noise into data by integrating an ordinary differential equation over time. The method defines probability distributions that continuously transform tractable Gaussian noise into complex protein structures.SimpleFold casts protein folding as a flow-matching generative model that produces protein structures from noise, conditioned on amino acid sequences. Given a protein with Na heavy atoms, the model builds a linear interpolant between noise and all-atom positions, where both exist in ℝ^(Na×3), conditioned on the amino acid sequence.Unlike earlier work that modeled only backbone atoms, SimpleFold generates full-atom conformations including both backbones and side chains. This comprehensive approach mirrors advances in sequence-augmented flow matching for proteins.The training combines two objectives: a standard flow-matching loss that measures velocity field prediction accuracy, and an additional Local Distance Difference Test (LDDT) loss that ensures structural quality. The LDDT loss measures atomic pairwise distance errors between generated and ground truth structures, helping the model learn refined atomic positions.A key innovation is the timestep resampling strategy. Instead of uniform sampling, SimpleFold uses a logistic-normal distribution that samples more densely near clean data (t=1). This focuses training on capturing fine structural details, particularly important for side chain positioning.A General-Purpose Transformer Architecture for ProteinsSimpleFold’s architecture represents a complete departure from domain-specific designs. The model uses only standard transformer blocks with adaptive layers, eliminating the expensive pair representations and triangular updates that define AlphaFold2.Overview of SimpleFold’s architecture built on general-purpose transformer blocks with adaptive layers, eliminating the need for pair representations or triangular updates.The architecture contains three main components: lightweight atom encoder and decoder modules (symmetric in design) and a heavy residue trunk. All modules use standard transformer blocks conditioned on timestep through adaptive layers.
Read more