+ Computational Protein, Enzyme, and DNA Design

< Combinatorial Optimization & Protein Structure

Robust Conformational Sampling via Combinatorial Optimization

To better understand the chemistry that drives the fundamental processes of life, it is critical that we know the molecular structures of the many proteins involved in these processes. However, current laboratory techniques for determining protein structure, X-ray crystallography and NMR spectroscopy, are relatively slow, expensive and often difficult to perform. The creation of automated tools for rapid protein structure determination would be nothing short of revolutionary. To this end, researchers have been working for several decades to design efficient algorithms to predict protein tertiary structure from the corresponding primary amino acid sequence. A simple approach is to run a molecular dynamics simulation on an unfolded amino acid chain, but the amount of computational resources required (e.g. distributed computing or supercomputers) makes this method intractable for all but the smallest proteins.

Instead, the modern approach is to sample the protein's vast conformational space, looking for likely structures. An approximate Hamiltonian is computed for each, and the structure with the lowest energy is the result. As an alternative to ab initio protein structure prediction (straight from an unfolded amino acid chain), it is often possible to start from an approximate homology model. In principle, one can then refine the structure (homology model remediation) until it is close to that of the real protein.

Given the current state of available computing power, the limiting factor for homology model remediation is conformational sampling; starting from a model that deviates roughly ~3Å from the true structure, the sheer number of similar structures that must be tested still makes most search strategies untenable. In contrast, a conformational search space created via the recombination of protein fragments with discrete positions in Cartesian space can be (i) extraordinarily large due to combinatorial explosion and (ii) efficiently searched with powerful combinatorial optimization algorithms. To develop these new methods we use SHARPEN, a protein modeling software platform designed for rapid prototyping of new algorithms.