Microsoft’s AI for Good Lab has created Seq2Symm, an open-source AI tool that helps scientists determine the 3D shapes of certain proteins, including those found in viruses.
Seq2Symm uses AI to predict a protein’s 3D shape and structure from a one-dimensional sequence. The tool could help researchers better understand diseases, develop drugs and vaccines, and create more sustainable materials.
Juan Lavista Ferres, CVP and chief data scientist, and Meghana Kshirsagar, senior research scientist and lead researcher on the project, sat down with MobiHealthNews to discuss Seq2Symm and how it could impact healthcare and more.
MobiHealthNews: Can you tell me about Seq2Symm?
Juan Lavista Ferres: In general, we know that proteins, particularly the symmetry of proteins, are very important. Proteins are important, from areas like drug discovery to energy. A lot of the things we do as a living organism depend on proteins. So, having an understanding of proteins and the design of these proteins helps a lot of researchers, and a fundamental aspect of that is understanding the symmetries.
Until now, until this discovery, there were ways to try to predict the symmetry, but it was not very, like, you could not do it very fast. So, the whole idea of these models…the main contribution is the fact that now we can do that aspect much faster. If we can do that much faster, we will help researchers do their work much faster. So we can expedite the research discovery.
Meghana Kshirsagar: Juan is right in that the main contribution of this work is on understanding structures of proteins, of a certain type of proteins, which contain a lot of repeating units and these are called homo-oligomers. These are very important because they appear in a lot of living organisms. So, for example, they appear in viruses.
So, these are [see picture above], for example, virus capsids. They are these spherical structures, which are present in almost all viruses. And what the viruses do is they put their DNA inside this capsule kind of structure, and this is then put into our cells when the virus comes into our body. And then this will break apart, and then the virus DNA comes out and multiplies. Now, this is made of these repeating units, and this is what is called homo-oligomers.
So, it has 180 copies of the same thing, repeating and forming this nice sphere, and so for a virus to function well, this is very integral. This is a very important part of how it works.
If you look at a pandemic like COVID, the first thing that researchers had from this virus was what is called the sequence of the virus, which means you only have one-dimensional information. So that is kind of saying, like, oh, I just have somebody’s name, for example, like a description of a person but you don’t have the 3D information about them.
What our method does is it takes this one-dimensional information, and it can predict this 3D information. It can say that it is going to form something that is of this shape and it has these many copies in it.
And so, you can imagine so many situations where you do not have this 3D information of the molecule or protein you are interested in. You only know this one-dimensional information.
But going from that to the 3D is very critical, and what we do here is we predict how many copies and what the shape will look like. And this is one concrete application where the method can be used.
MHN: So, it is a prediction model.
Kshirsagar: Yeah, it is a prediction model.
Ferres: We are predicting, and this is an example, the virus is an example. But again, this is something that, for everything that is a living organism, depends on proteins. So, this has applications, not just for a virus, but for a huge range of problems, from understanding Alzheimer’s to creating new drugs. So, the type of effect and impact that this has is tremendous because of the dependency that we have of better understanding proteins.
MHN: Do you see a specific area where it has the most promise? Maybe cancer or Alzheimer’s, like you mentioned.
Kshirsagar: So, certainly, it has applications in Alzheimer’s and in studying viruses. These are the biggest applications from a health perspective. And then, of course, there are a whole host of applications in sustainability and so on.
MHN: So, it is not just in healthcare. This is something that can be used, like you said, with all living organisms.
Kshirsagar: Yes.
Ferres: Exactly, and this includes from materials to…this is why, again, one of the reasons why we decided to invest in a better understanding of protein folding, we’ve been working in collaboration with the Baker Lab and Gregory Bowman and the team for at least three to four years, if not more, and we dedicated a lot of effort on this area, particularly because of that tremendous impact that this can have.
These are very hard problems, very important problems and sometimes not the easiest project for us to explain.
A lot of people do not understand why we care so much about proteins. Clearly, these are the fundamental aspects of life and materials and it touches everything, basically.
MHN: And you have made it an open-source model as well.
Ferres: This is open research and also completely open source. Anybody can use it to further research. Our impact is providing these tools so other researchers can leverage it. We expect other people to have an impact, so we are enabling impact through this.
This will have an impact on evolving diseases, how to target drugs, and how to help us design vaccines or new treatments. So, it has a broad impact.
Kshirsagar: Just like Juan said, since proteins form the fundamental building blocks of not just all life on Earth but also a lot of materials, making an impact in that space leads to really broad and useful tools.