Dr. Baker's journal archive 2009

Message boards : Rosetta@home Science : Dr. Baker's journal archive 2009

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 62409 - Posted: 24 Jul 2009, 6:17:49 UTC

Many of you know that David Kim is the architect of Rosetta@home and spends much of his time maintaining and improving rosetta@home. He also has had time to carry out some very interesting research. Some time ago (before the papers I described in my posts below were submitted), David wrote a paper describing his results on the search problem in protein structure prediction, and submitted it for publication. In contrast to much of our work, which is on biomedical problems such as aging as described below, David's work described in the paper is on the fundamental question of what limits our ability to predict structures accurately for larger proteins. We received the reviews from anonymous referees recently, and to give you some insight into what the process of scientific publishing is like, I'm pasting first the abstract and then the reviews here:

abstract of David Kim's paper:
The primary obstacle to de novo protein structure prediction is conformational sampling: the native state generally has lower free energy than non-native structures but is exceedingly difficult to locate. Structure predictions with atomic level accuracy have been made for small proteins using the Rosetta structure prediction methodology, but for larger and more complex proteins, the native state is virtually never sampled and it has been unclear how much of an increase in computing power would be required to successfully predict the structures of such proteins. In this paper we develop an approach to determining how much computer power is required to accurately predict the structure of a protein, based on a reformulation of the conformational search problem as a combinatorial sampling problem in a discrete feature space. We find that conformational sampling for many proteins is limited by critical “linchpin” features, often the backbone torsion angles of individual residues, which are sampled very rarely in unbiased trajectories and when constrained dramatically increase the sampling of the native state. These critical features frequently occur in less regular and likely strained regions of proteins that contribute to protein function. In a number of proteins, the linchpin features are in regions found experimentally to form late in folding, suggesting a correspondence between folding in silico and in reality.

Anonymous reviews of David's paper (we got these about three months after the paper was submitted to the journal).

Reviewer #1: It's an excellent and interesting paper. I certainly recommend publication. The novelty here is the idea of feature strings from native proteins and the identification by Bayesian analysis of "linchpin features" that are hard to find computationally and may be slow to develop physically.

I recommend publication, as is, but I would encourage the authors to comment on how to understand their linchpin features. For example, why should they reside near functional sites? One might have expected biological function not to have a role in determining the folding difficulty and kinetics. It would be interesting to hear the authors' thoughts about why these particular linchpins?

Reviewer #2: Ref: JMB-D-09-00307
Title: Sampling bottlenecks in de novo protein structure prediction
Authors: David E Kim; Ben Blum; Philip Bradley; David Baker

This is a provocative and important paper, and overall is handled very nicely with helpfully detailed examples. Quantifying an estimate of required sampling times and demonstrating that a feature list is adequate are both valuable, and the identification of unfavorable "linchpin" features that stall the predictions seems likely to be a very pivotal observation. It's also a nice name for them. I definitely recommend publication.

************* end of reviews

The paper has now been accepted, and will appear in the journal of molecular biology which I believe can be freely accessed by anyone. now you know what David was doing in his spare time last year; he has now switched to working on another project I will tell you about some other time.


ID: 62409 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 62898 - Posted: 12 Aug 2009, 4:18:55 UTC

We got very good news today--the paper I described to you several months ago was finally accepted for publication in Nature, one of the most widely read science journals.

The manuscript describes the design of new enzymes for use ultimately in gene repair. Suppose you have a mutation in your genome that causes you to have a disease. If you know where the mutation is, in principle it can be repaired by "cutting out" the bad information (the mutation) and replacing it with the correct DNA sequence in this region. To do the cutting out is tricky because you only want to cut right near the mutation, and not anywhere else within your 3 billion base genome.

We are developing computational methods using Rosetta to design new enzymes to do this very highly specific "cutting". In this manuscript, we show that we can now design new specific cutting enzymes and that we can control independently how fast they cut and how tightly they bind to the target sequences we design them to cut.

Hopefully you will be able to read about this soon at your local newsstand!
ID: 62898 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 62924 - Posted: 14 Aug 2009, 6:21:42 UTC

We just learned that a model we had generated using the "fold and dock" method on your computers allowed the solution of the phase problem for structural biologists who had grown crystals of the (dimeric) protein but not been able to solve its structure--this result led to a huge flurry of email today as the approach could be very useful for a broad class of problems.

As I mentioned earlier, the paper on the "fold and dock" protocol (which you can recognize by the symmetric dancing of molecules on your screensaver) was enthusiastically accepted for publication in the proceedings of the national academy of sciences and will be free on line soon. Here is one of the anonymous reviews of the manuscript:

"The authors introduce a simultaneous "fold and dock" routine for the prediction of symmetric systems with impressive results. The method combines the Rosetta folding algorithm and the lab's more recent symmetric docking protocol. Here, both the backbone moves and relative positioning of the components are accomplished in the same Monte Carlo routine. Undoubtedly, this feature is necessary for most of the systems, particularly those which are intertwined. The authors both benchmark their results and even include a few predictions (using structures that came out after simulations). In addition they test the utility of the method for ab initio phasing and structure prediction incorporating some NMR-based constraints. This is an extremely nice piece of work which addresses an outstanding challenge in structural biology and protein structure prediction."

ID: 62924 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 62939 - Posted: 15 Aug 2009, 22:50:59 UTC

As Mod Sense very nicely described in the discussion thread, with rosetta@home we are tackling basic research problems in addition to disease related research such as the gene therapy efforts I described in my last post and the amyloid fiber blocking and other projects described in the "disease related research" section on our home page. One of the "basic research" problems we are tackling with rosetta@home is the development of more accurate energy functions. This is a critical problem for design of new drugs to cure diseases. Here is why:

Almost all drugs work by binding tightly to a protein structure and blocking or modulating the function of the protein. To be produced cheaply, and to be able to access the inside and outside of your cells, it is usually desirable for a drug to be a not too large molecule. Chemists have put together large collections of potential drug molecules both in computer databases and in actuality. Big pharmaceutical companies have on the order of millions of such compounds they can test.

So--you would think that to find a new drug, given the structure of a protein target, one could simply screen on the computer for the drug that binds the tightest to the target. For each potential drug compound, we can determine its lowest energy binding mode to the target using Rosetta docking--you have probably seen small molecules docking into proteins on your screensavers. There are other programs for small molecule docking as well which are used in other distributed computing projects.

We and others have been pretty successful at predicting how a specific small molecule binds to a protein. We can test this by comparing the lowest energy structure you find on rosetta@home with the crystal structure of the small molecule bound to the protein if there is one. the big advantage of rosetta compared to other approaches is that the protein can flex during the docking process; this is important because many experimentally determined structures have shown considerable movements take place in proteins when they bind small molecules.

However, for drug discovery, the problem is not just to dock one small molecule into a protein, but to dock millions of small molecules into proteins, and find the one that binds the tightest. the amount of computer time required to accurately dock millions of small molecules into a protein target with a detailed physically realistic model like Rosetta is too large for rosetta@home currently. however, this is not the only problem. different small molecules have different numbers and different types of atoms, and determining which of these has the tightest binding energy to the protein is very challenging. We and others have had considerable difficulty in ranking sets of compounds known to bind a target, because errors in energy computation have big effects. We can predict protein structures accurately because the correct structure has very much lower energy than any other structure, but relatively small errors in energy computation can drastically change the ranking of different small molecules bound to a protein.

Because of these problems, large pharmaceutical companies surprisingly enough rely as much or more on brute force experimental screening to see which of their millions of compounds bind most tightly to a target as they do on screening on the computer. of course, screening through millions of compounds for the tightest binder experimentally is very slow and hugely expensive-this is one of the reasons why drug discovery is so expensive.

So if we can improve our ability to calculate energies accurately, it would have a huge effect on many important applications, including drug discovery and design. In my next post, I'll explain how we are using rosetta@home to increase the accuracy of energy calculations.
ID: 62939 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 63383 - Posted: 17 Sep 2009, 4:54:00 UTC
Last modified: 18 Sep 2009, 4:24:39 UTC

As I described in my last post, increasing the accuracy of the energy function (Rosetta@home's description of reality) is critical to all of our efforts, including designing new drug molecules that bind tightly and specifically to target proteins.

What information can serve as a guide to improve the energy function? In some cases, we actually know the energy difference between two conformations, or more often, between two very closely related sequences which have pretty much the same structure. Graduate student Liz Kellogg is working to use this type of information to test the energy function.

A particularly useful source of information are protein structures determined experimentally by X-ray crystallography. There are many thousands of such structures, and each one we know must be very near the lowest energy structure for the corresponding amino acid sequence. We can test the Rosetta energy function by seeing if it properly assigns lower energy to these structures than to very different conformations that can be generated. Rosetta almost always passes this test as I've described elsewhere--this is why the structure prediction problem is primarily a sampling problem.

For a more sensitive test of energy function accuracy, we compare the distributions of distances between atoms, the distributions of torsion angles, the distribution of hydrogen bonding geometries in experimentally determined structures to those in low energy Rosetta models. the discrepancies are highlighting internal inconsistencies in the Rosetta energy function, which are straightforward to correct once they are found. Postdoctoral fellow Yifan Song is making great progress in ironing out these last wrinkles in the rosetta energy function. his approach is to identify discrepancies in the geometries in Rosetta generated structures, tune the energy function to try to eliminate the discrepancy, and then to generate new structures using the new energy function to see if we have come closer to the truth. Very encouragingly, as Yifan improves the energy function in this way we are getting better results in other tests; for example the close to native structures are even lower in energy compared to non-native structures than they were originally.

The origins of these remaining inaccuracies that Yifan is fixing will interest some of you. The Rosetta energy function describes hydrogen bonding and interactions between spatially adjacent atoms very precisely. The energy function also models the energies associated with rotations around backbone and sidechain torsion angles, using data taken from the large number of high resolution structures. It turns out that when these two types of terms are combined, there are subtle double counting effects--some geometries are favored for example both because a very low energy hydrogen bond is formed, and because the backbone geometry is very frequently observed in protein structures--and these together lead to an oversampling of these local geometries in low energy Rosetta structures. Yifan's solution is to modify the torsional potential to make these geometries less highly rewarded since their favorability is already captured by the hydrogen bonding potential. As you can imagine, to get these corrections perfect so that the geometries in Rosetta models precisely match those in native structures requires some iteration, which is why Yifan has been extensively running on rosetta@home in the past week!
ID: 63383 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 63932 - Posted: 3 Nov 2009, 4:59:49 UTC

Today I was asked this question:

"Hi David, I am a user of the BOINC application and running Rosetta. I have searched the website and can't find any sort of overall status on how the entire mapping project is going.

I, and I think many others, would be very interested to seeing some sort of progress indicator on where the project is and some predictions on when the mapping process will be complete at Rosetta's current research/growth rate. (when every possible protein fold has been completely mapped and cross-checked)

Is this possible? Are we at 5%? 10%? Will the project be complete in 5 years? 10 years? This would be great info for the layman that doesn't know much about this subject but is happy to donate computer time for this research."

I thought I would answer this question here for other participants who might be interested as it highlights the different between rosetta@home and most other distributed computing projects.

The answer is that the problems we are tackling with rosetta@home--computing the structures of biological macromolecules and designing new molecules to try to cure diseases and improve human health generally--are long term problems that will not be completely "solved" any time soon. Much of our work is also aimed at improving our methods and algorithms so we can design new molecules and ultimately drugs more accurately.

In most distributed computing projects, a computer program that has been developed is run on large sets of data. the computer program doesn't change over the course of the project, and the progress toward completion can be assessed by determining what fraction of the data set the calculation has been run on and what fraction is left.

This estimate can't be done for rosetta@home because the scope of problems we are trying to solve is much larger, and because we are continually extending rosetta@home to try to solve new problems.

So we can't quantify our progress by giving you a % complete. Instead, the project's contributions and progress can be evaluated by the many scientific publications it has produced, some of which I've tried to summarize in these posts. (the current issue of Nature for example has the article I described below on designing new enzymes to ultimately repair disease causing mutations).
ID: 63932 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 64060 - Posted: 17 Nov 2009, 4:16:32 UTC

I often get asked about the progress we are making with the invaluable contributions all of you are making to our efforts. While we (unfortunately) have not yet succeeded in developing real world therapies for treating diseases, your contributions have been critical for our advances on the basic science side which should ultimately lead to the development of such therapies. These are documented in the scientific publications that have come out of the project; see https://boinc.bakerlab.org/rosetta/rah_publications.php for a recently updated list. Publication lists are one way you can assess the impact of your contributions to distributed computing projects--hopefully in not too long you will be able to see the impact in new disease treatments!
ID: 64060 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 64416 - Posted: 9 Dec 2009, 5:19:04 UTC

Many of you I'm sure remember the protein folding calculations with the zinc atoms. Graduate student Chu Wang wrote a scientific paper describing the method he developed for predicting the structures of zinc containing proteins and the testing of the method with all of your help. The paper was just accepted for publication in the scientific journal Protein Science, and now scientists everywhere will be able to learn about Chu's method so they can predict structures of this important class of proteins also.
ID: 64416 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2

Message boards : Rosetta@home Science : Dr. Baker's journal archive 2009



©2024 University of Washington
https://www.bakerlab.org