New deep-learning approach predicts

Nearly every fundamental biological process necessary for life is; carried out by proteins. They create and maintain the shapes of cells and tissues; constitute the enzymes that catalyze life-sustaining chemical reactions; act as molecular factories, transporters and motors; serve as both signal and receiver for cellular communications; and much more. Composed of long chains of amino acids; proteins perform these myriad tasks by folding themselves; into precise 3D structures that govern how they interact with other molecules.

Because a protein’s shape determines its function; and the extent of its dysfunction in disease; efforts to illuminate protein structures are central to all of molecular biology — and in particular; therapeutic science and the development of lifesaving and life-altering medicines. In recent years, computational methods have made significant strides in predicting; how proteins fold based on knowledge of their amino acid sequence. If fully realized, these methods have the potential to transform virtually all facets of biomedical research. Current approaches, however, are; limited in the scale and scope of the proteins that can be determined.

A whole new vista from which to explore protein folding

Now, a Harvard Medical School scientist has used a form of artificial intelligence; known as deep learning to predict the 3D structure of effectively any protein; based on its amino acid sequence. Reporting online in Cell Systems, systems biologist Mohammed AlQuraishi details; a new approach for computationally determining protein structure — achieving accuracy comparable to current state-of-the-art methods but at speeds upward of a million times faster.

“Protein folding has been one of the most important problems for biochemists over the last half century, and this approach represents a fundamentally new way of tackling that challenge,” said AlQuraishi, instructor in systems biology in the Blavatnik Institute at HMS and a fellow in the Laboratory of Systems Pharmacology. “We now have a whole new vista from which to explore protein folding, and I think we’ve just begun to scratch the surface.”

Easy to state

While highly successful, processes that use physical tools to identify protein structures are expensive; and time consuming, even with modern techniques such as cryo-electron microscopy. As such, the vast majority of protein structures — and the effects of disease-causing mutations on these structures — are still largely unknown. Computational methods that calculate how proteins fold have the potential to dramatically reduce the cost and time needed to determine structure. But the problem is difficult and remains unsolved after nearly four decades of intense effort.

Proteins are; built from a library of 20 different amino acids. These act like letters in an alphabet, combining into words, sentences and paragraphs to produce an astronomical number of possible texts. Unlike alphabet letters, however, amino acids are physical objects positioned in 3D space. Often, sections of a protein will be in close physical proximity but be separated by large distances in terms of sequence, as its amino acid chains form loops, spirals, sheets and twists.

“What’s compelling about the problem is that it’s fairly easy to state: take a sequence and figure out the shape,” AlQuraishi said. “A protein starts off as an unstructured string that has to take on a 3D shape, and the possible sets of shapes that a string can fold into is huge. So, many proteins are thousands of amino acids long, and the complexity quickly exceeds the capacity of human intuition or even the most powerful computers.”