George Rebane
Rejoice! We have all lived to see the practical solution to one of the most complex and important processes that defines and determines how all critters live and die. It’s called protein folding, and it describes how a very long string of bio-molecules called amino acids first hook up with each other, and then snap into a (minimum energy) shape that enables it to function with other bio-molecules. (more here) These other bio-molecules may be on the surface of or inside various kinds of living cells. And here is how MIT Technology Review describes it –
“A protein is made from a ribbon of amino acids that folds itself up with many complex twists and turns and tangles. This structure determines what it does. And figuring out what proteins do is key to understanding the basic mechanisms of life, when it works and when it doesn’t. Efforts to develop vaccines for covid-19 have focused on the virus’s spike protein, for example. The way the coronavirus snags onto human cells depends on the shape of this protein and the shapes of the proteins on the outsides of those cells. The spike is just one protein among billions across all living things; there are tens of thousands of different types of protein inside the human body alone.”
The basic idea to grasp here is that proteins do their work physically through the way they are shaped with all kinds of complex sticky-out parts that enable it to latch onto other molecules or even ‘destroy’ them. It turns out that all the complex stuff that goes on inside the deep recesses of living things depends on how giant ‘LEGO games’ are assembled and played in very tiny but complex universes.
There exist myriads of different proteins numbering in the, who knows, hundreds of thousands (millions?), with the possibility to assemble gazillions more different shaped proteins that don’t yet exist in nature. And here has been the rub. We can write down and/or derive the chemical structure of a protein in a form that all high school chemistry students were taught with all the N, C, H, O, … atoms hooking up to each other through various combinations. But that only tells us the ‘stretched out’ sequence of the protein's constituents. However, that’s not how they exist and do their work. When the necessary ingredients for a given protein are put into a ‘soup’, they tend to hook up according to one or more of the possible stretched out versions, and then instantly this long and complex string of atoms folds or bunches up into a very special shape that gives it the ability to do its work.
The very special folded shape (see graphic) is brought about by the folding molecule, like all conformable structures in our universe, seeking its minimum energy state. The energy level of a bunch of connected atoms is determined by their resultant electric field which in turn is determined by the physical configuration of the atoms with respect to each other. And now you can see that for a bio-molecule, with thousands of strung together atoms, there are a lot of possible shapes each with its own very complex electric field and corresponding energy content. Now which of these gazillions of shapes is at the lowest energy level? Or said differently –
“Identifying a protein’s structure is very hard. For most proteins, researchers have the sequence of amino acids in the ribbon but not the contorted shape they fold into. And there are typically an astronomical number of possible shapes for each sequence. Researchers have been wrestling with the problem at least since the 1970s, when Christian Anfinsen won the Nobel prize for showing that sequences determined structure.”
Google’s Deep Mind AI outfit has come up with a humongous deep-learning neural net called AlphaFold that was trained to recognize and analyze hundreds of thousands of known protein molecules. The bottom line here is that when it is given the stretched out structure of the thousands of hooked up atoms, it is able to ‘very quickly’ figure out how the stretched out version folds into a very specific minimum energy form with various sticky-out parts, dents, and deep holes that make it in/compatible with certain other bio-molecular structures in a critter, plant, or organic broth. (more here)
To give you an idea of the breakthrough, in the old days (i.e. yesterday) our fastest computers would wrestle with a given stretchy structure for months or years looking for the absolute minimum energy configuration. (Each configuration requires the computation of millions of unimaginably complex electric field shapes from the physical position of a single feasible configuration of atoms that is allowed by the physics of our universe.) Today AlphaFold has reduced that time ranging from a few hours to a few days. This opens up whole new worlds of bio-molecular design for all kinds of new medicines, energy conversion bio-molecules, foods, materials, … .
And to show how technology is accelerating, there is an even faster algorithm working on a recurrent geometrical network (RGN) that promises to be “a million times faster” than AlphaFold, able to solve the folding problem in seconds. And as all this work is being published, a hundred entrepreneurial efforts will launch, not only to implement RGN based folding tools, but also use them to solve important problems to provide humankind with better healthcare, cleaner environments, cheaper energy, and new foods – all affordable like never before.
There is a lot more to be said about this breakthrough – e.g. development of an entirely new type of bio-computer that is faster and more energy efficient than today’s von Neumann silicon-based computers. And perhaps the intelligent machine that achieves Singularity peerage with humans will be an auto-configuring bio-computer. Now ain’t minimally regulated and taxed capitalism wonderful?
Dr. Rebane:
Is this the same as the breathtaking article I read on “protein mapping” a day or two ago? Mind blowing and what great good ole Yankee know-how. If it is the same thing under a different name (protein mapping or protein folding) then it is most important discovery since the thermos. It also dovetails with my believe (micro-biology, once called chemistry) that each of us Is indeed wonderfully made. And “a development of an entirely new type of bio-computer that is faster and more energy efficient than today’s von Neumann silicon-based computers” to boot? It is hard for a mortal coil such as I to fathom such wonders. Can the finite comprehend the infinite? Is protein folding built on recent protein mapping? Things are moving fast...in leaps and bounds.
Posted by: Bill Tozer | 01 December 2020 at 09:00 PM
BillT 900pm - I believe protein mapping is going the other way around. You start with a (folded) protein which you subject to various kinds of micro measuring devices like crystallography to map out its stretched out atomic structure to ascertain what the protein is made of in terms of what atoms hook onto what other atoms.
In training AlphaFold, this is what they did with proteins in known (measured) folded configurations. They measured/confirmed the stretched out structure and then had AlphaFold attempt to find the proper low energy correct configuration (for which the AI was 'rewarded'). This was done with over 100,000 known proteins during the learning process.
Posted by: George Rebane | 01 December 2020 at 09:26 PM
"reduced that time from a few hours to a few days..."
Say What? O/w, great read.
Posted by: L | 01 December 2020 at 09:29 PM
Thanks Dr. Rebane. I can't find the recent article on new stunning breakthroughs in protein mapping, but an exhaustive search tonight of maybe 50 scientic sites and references to numerous publications all point to protein folding in the past 24 hours. Me thinks I read an article that had similar language to protein mapping...such as this paragraph from Bloomberg, the closest thing I could find:
"Google’s artificial intelligence unit took a giant step to predict the structure of proteins, potentially decoding a problem that has been described as akin to mapping the genome."
Then the next paragraph jumps into Deep Mind and protein folding.
"Never mind", he says as he sheepishly backs away. :)
Posted by: Bill Tozer | 01 December 2020 at 10:04 PM
Tozer 1004pm - I agree that the advance in protein folding is akin in importance to genome mapping.
Posted by: George Rebane | 01 December 2020 at 10:27 PM
Accuracy of protein folding is within an atom's width. "Needs improvement" he says from the peanut gallery. Still, you took it to another level with the Al-bio combo, a topic you have brought up before. Having data is one thing, knowing what it means is another. Kudos.
Posted by: Bill Tozer | 01 December 2020 at 11:00 PM
Nature is amazing. “Can the finite comprehend the infinite?” If we could comprehend the infinite, I fear what mankind would do with it. Have you all seen the movie Prometheus? Lol. How do you think these mind blowing discoveries will effect our future George? I don’t have much faith in my fellow man right now so I fear the worst.
Posted by: Barry Pruett | 02 December 2020 at 03:23 AM