Lowest energy structure and distance to true structure

Author	Message
darkpella Send message Joined: 27 Sep 05 Posts: 13 Credit: 66,840 RAC: 0	Message 27555 - Posted: 19 Sep 2006, 13:16:23 UTC Last modified: 19 Sep 2006, 13:16:36 UTC Hi, I had a look at the top prediction page (https://boinc.bakerlab.org/rosetta/rah_top_predictions.php) and a pair of questions arose: 1. How is distance of a predicted structure to the true structure calculated (i.e. what does RMSD stand for)? 2. I noticed that the lowet energy structure and the lowets distance to true structure one are never the same . Does this mean that natural protein folding does not pursue lowest energy? What pursues R@H then? How can one possibly choose which prediction is the most accurate one among the many ones that come in, if the lowest energy criterion is not the right one? bye you all! darkpella ID: 27555 · Rating: 0 · rate: / Reply Quote

Christoph Jansen Send message Joined: 6 Jun 06 Posts: 248 Credit: 267,153 RAC: 0	Message 27563 - Posted: 19 Sep 2006, 15:24:07 UTC - in response to Message 27555. Last modified: 19 Sep 2006, 15:26:01 UTC Hi darkpella, 1. How is distance of a predicted structure to the true structure calculated (i.e. what does RMSD stand for)? RMSD means "Root mean square deviation". It goes like this: - Overlay both structures and measure the distances between calculated position and true position - Square all distances - divide the sum of all these squares by the number of all atoms - take the square root of the result Here is the formula explained in the Wikipedia. (In fact you will find the closest match of both structures by first overlaying them roughly and calculating the RMSD. Then you optimize that until the lowest RMSD is reached which is your final value.) 2. I noticed that the lowet energy structure and the lowets distance to true structure one are never the same . Does this mean that natural protein folding does not pursue lowest energy? What pursues R@H then? How can one possibly choose which prediction is the most accurate one among the many ones that come in, if the lowest energy criterion is not the right one? After all we know natural protein folding does pursue the lowest energy structure. All structures of known proteins that were tested with Rosetta are at least somewhere near the energetic minimum and not in quite another theater. If Rosetta would work perfectly there would be a match of the lowest energy structure and the lowest RMSD structure. Or almost a match, I guess there will always be little deviations, as you cannot measure all factors involved with absolute precision. There is still a lot of empirism in it as far as I understand it like Temperature effects or the effect of mineral concentrations in our cells' plasma. In addition to that it is well possible that the protein structures as e.g. found in crystals are a little distorted due to packing effects and energetic differences between the surrounding in the crystal and in a living cell. And finally the accuracy of the measured protein structures is limited, they mostly are known to something like 1 or 1.5 Angstroms if they are real good structures. In other words: you may find the almost true structure by simulation but still have an RMSD of 1 Angstrom in comparison to the otherwise determined structure because it is just wrong by that amount at average. But above all that: Rosetta is currently under development and is still collecting information as to what current algorithms are best suited and what future algorithms might look like. There has never been as much computing power for it as there is today, so one might say that that field of research is still in its infancy. To sum it up: if Rosetta did the job perfectly, then CASP would be unnecessary. We'd then have THE tool already and everybody would use that instead of trying to design their own one. ID: 27563 · Rating: 1 · rate: / Reply Quote

darkpella Send message Joined: 27 Sep 05 Posts: 13 Credit: 66,840 RAC: 0	Message 27572 - Posted: 19 Sep 2006, 16:59:43 UTC - in response to Message 27563. Last modified: 19 Sep 2006, 17:00:15 UTC Hi darkpella, RMSD means "Root mean square deviation". It goes like this: - Overlay both structures and measure the distances between calculated position and true position - Square all distances - divide the sum of all these squares by the number of all atoms - take the square root of the result Here is the formula explained in the Wikipedia. (In fact you will find the closest match of both structures by first overlaying them roughly and calculating the RMSD. Then you optimize that until the lowest RMSD is reached which is your final value.) Guess the optimization is done against the 6 DOF of a rigid body in the 3D space, ain't it? After all we know natural protein folding does pursue the lowest energy structure. All structures of known proteins that were tested with Rosetta are at least somewhere near the energetic minimum and not in quite another theater. If Rosetta would work perfectly there would be a match of the lowest energy structure and the lowest RMSD structure. Or almost a match, I guess there will always be little deviations, as you cannot measure all factors involved with absolute precision. There is still a lot of empirism in it as far as I understand it like Temperature effects or the effect of mineral concentrations in our cells' plasma. In addition to that it is well possible that the protein structures as e.g. found in crystals are a little distorted due to packing effects and energetic differences between the surrounding in the crystal and in a living cell. And finally the accuracy of the measured protein structures is limited, they mostly are known to something like 1 or 1.5 Angstroms if they are real good structures. In other words: you may find the almost true structure by simulation but still have an RMSD of 1 Angstrom in comparison to the otherwise determined structure because it is just wrong by that amount at average. My aerodynamics teacher at college always told that, when experiments don't agree with predictions, it might be that predictions are wrong or that experiments are wrong (or both). Guess that's the case... But above all that: Rosetta is currently under development and is still collecting information as to what current algorithms are best suited and what future algorithms might look like. There has never been as much computing power for it as there is today, so one might say that that field of research is still in its infancy. To sum it up: if Rosetta did the job perfectly, then CASP would be unnecessary. We'd then have THE tool already and everybody would use that instead of trying to design their own one. I'll let my PC go on crunching for R@H then (and for Predictor@Home either, as soon as they get back with their new prediction scheme) Thanks for the explanation! Bye darkpella ID: 27572 · Rating: 0 · rate: / Reply Quote

Christoph Jansen Send message Joined: 6 Jun 06 Posts: 248 Credit: 267,153 RAC: 0	Message 27578 - Posted: 19 Sep 2006, 18:06:53 UTC - in response to Message 27572. Last modified: 19 Sep 2006, 18:12:02 UTC My aerodynamics teacher at college always told that, when experiments don't agree with predictions, it might be that predictions are wrong or that experiments are wrong (or both). Guess that's the case... It's not that simple, but neither is aerodynamics. To take an analogy that is apt for someone who knows aerodynamics: imagine you would want to know the aerodynamic behaviour for a large number of coupled aerodynamic control surfaces like let us say you have 10 identical planes flying one behind the other (not sensible, but just for the sake of it). Each wing flap, rudder or whatever is moved will influence the last plane via all the others. You would only be able to calculate that behaviour to a certain degree by solving the Navier Stokes Equations numerically for the whole system. Wind Tunnel tests will get you there quicker, but they are limited by the accuracy with which you build the model and each alteration needs to be tested and parametrized again. Beyond that, every degree of temperature difference will influence the result. SO solving the equations is the better, although the harder way if you are the first to go there. That is about the situation when determining a structure for a protein. The accuracy with which we can currently do that is limited and it dwindles a lot with protein size. X-Ray crystallography or NMR-based methods can only give you a resolution that is close to an atomic one for relatively small molecules. With growing size they get pretty inaccurate. Purple acid phosphatase with 459 amino acids was done by a colleague of mine in the mid 90s with a resolution of around 2.5 Angstroms IIRC and that was brilliant then. Methods are becoming better and PCs are becoming faster, but you cannot get beyond a certain resolution as nature simply keeps you from doing so with a lot of effects. The Debye Waller factor that describes how an atom will scatter X-rays e.g. varies with temperature but also with the chemical surrounding of an atom (or rather with the true electron density around the nucleus). How much it varies is dependent on the neighbors of the neigbors which in turn... you get the point, it is highly coupled behaviour and each new atom contributes another set of parameters. If you brake it down to quantum mechanics that is the one method that ultimately offers the best way of doing structures once you get the hang of it, but it depends on computing power a lot. Like for solving the Navier Stokes Equations you need a clever approach if you want to do it from scratch. The X-Ray and NMR structures are kind of the wind tunnel experiments that show you if your parameters are sensible for known test cases. But in the end you want to do it in the PC and without the wind tunnel as that is fastest, cheapest and most accurate once you find the way. ID: 27578 · Rating: -1 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 27618 - Posted: 19 Sep 2006, 23:31:04 UTC Oh WELL DONE! I just have this vivid picture of planes tethered together and how it would whip around in a wind tunnel... and trying to PREDICT those whips! Whew! Makes proteins seem easy! Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 27618 · Rating: 0 · rate: / Reply Quote

uioped1 Send message Joined: 9 Feb 06 Posts: 15 Credit: 1,058,481 RAC: 0	Message 28987 - Posted: 6 Oct 2006, 20:35:32 UTC - in response to Message 27563. Last modified: 6 Oct 2006, 20:36:19 UTC 2. I noticed that the lowet energy structure and the lowets distance to true structure one are never the same . Does this mean that natural protein folding does not pursue lowest energy? What pursues R@H then? How can one possibly choose which prediction is the most accurate one among the many ones that come in, if the lowest energy criterion is not the right one? If Rosetta would work perfectly there would be a match of the lowest energy structure and the lowest RMSD structure. One thing to add, which is based on some discussions of the R@H algorithm from quite some time ago, The RMSD of a predicted structure can only be calculated when the actual structure is already known. The Rosetta application does not use RMSD in any part of it's heuristics. However, one result of the R@H effort has been is to develop a means to estimate the RMSD of a particular set of models for a protein that R@H has returned, based on their similarity, and how they relate to the other models that have been returned. So, because the project knows how many results were returned, and how similar, say, the best 1/10th of 1% of the results are, they can say "Here's the lowest energy structure we have predicted AND it's within X RMSD of the actual structure!" That was a pretty major advance by itself. ID: 28987 · Rating: 0 · rate: / Reply Quote