Scientists have traditionally thought that DNA binding proteins use patterns in the genome's code of As, Cs, Ts, and Gs to guide them to the right location, with a given protein only binding to a specific sequence of letters. In a new study published in Cell Systems, scientists discovered that proteins must rely on another clue to know where to bind: the DNA's three-dimensional shape.

For cells to take on their differing roles, they must be able to turn on and off specific genes with precise control. The genes active in a brain cell, for instance, are different than those active in a skin cell. This is achieved in part by the action of "DNA binding proteins" that latch onto the human genome at particular places to turn genes on or off.

"For decades, we've had difficulty explaining how proteins find the correct places to bind in the DNA, and how they do that in a specific way and without binding to the wrong places," said Pollard. "We hypothesized this could be explained by the structural aspect of the genome."

A type of keyhole that select proteins slot was created

That's because DNA's string of letters is also a physical, three-dimensional structure, twisted into the famous double-helix shape and wrapped up into a microscopic package. Within its ladder-like structure, a variety of twists, grooves, and gaps can be found between the rungs and sides. Pollard and her team realized these variations create a type of keyhole that select proteins slot into.

"There's a rich scientific literature on how proteins interact with each other or bind to chemicals, and it's always through a kind of lock and key mechanism; why would proteins binding to DNA be any different?" said Md. Abul Hassan Samee. "We think the proteins dock onto DNA as a 3D structure, just like when they interact with other proteins or with chemicals."

DNA shape provides additional information

Earlier work had raised the possibility that DNA shape provides additional information to proteins on where to bind, but it was unclear how influential these shapes were. The researchers adapted a common machine learning algorithm typically used to identify the letter sequences proteins bind to.

This fact helps explain the two biggest mysteries in protein binding to DNA. First, proteins that bind to multiple different letter sequences turn out to be homing in on the same spatial pattern, and second, proteins that appear to share letter sequences are in fact attaching to very different shapes."It was accepted that a pattern of As, Cs, Ts, and Gs where a protein bound to DNA had a particular shape," said Pollard.

"But nobody had looked to see whether other binding locations that couldn't be explained with that pattern of letters might have the same shape. If we can show in a dish that proteins can recognize a DNA location because of its shape, even when it doesn't contain the established letter sequence, I think it would be game changing."

"There's a huge effort right now to understand how mutations in this dark DNA cause disease, and that's important because for most complex diseases, the majority of the genetic mutations are outside of genes," explains Samee.