Basic Questions about R@H progress, not in FAQ

Author	Message
hhiji Send message Joined: 15 Mar 09 Posts: 5 Credit: 40,591 RAC: 0	Message 61150 - Posted: 13 May 2009, 1:49:48 UTC We crunch a lot -- a quarter million job (successes) each day. * How many distinct proteins are we folding? * What are they? * Are most of them proteins whose native states are known... (since most of my WU's show the native state)? * ...are we folding any proteins whose native states are not known, whose structures may be found for the first time through R@H? (some of my WU's do not show the native state or RMSD) Harder, optional: * What are the assumptions made by the R@H algorithm that reduce computational time/sample space for crunching? * Is the algorithm continually revised as results of crunching come in? Thanks a bunch to all who take the time to write back. ID: 61150 · Rating: 0 · rate: / Reply Quote

Sean Kiely Send message Joined: 31 Jan 06 Posts: 65 Credit: 43,992 RAC: 0	Message 61151 - Posted: 13 May 2009, 3:47:59 UTC - in response to Message 61150. I might be able to answer one or two of these. Most of the proteins being crunched have known native states. . . the Bakerlab folks are basically continually tweaking their algorithms then running them to see how quickly (and efficiently) they can arrive at the known solution. One semi-exception to this is when Bakerlab competes in the bi-yearly CASP contest, where proteins are crunched that have known, but unpublished, native states. The goal is to arrive at (close to) the correct (unpublished) state. The native structures are not disclosed until the end of the competition. Hope you get answers on the other stuff! We crunch a lot -- a quarter million job (successes) each day. * How many distinct proteins are we folding? * What are they? * Are most of them proteins whose native states are known... (since most of my WU's show the native state)? * ...are we folding any proteins whose native states are not known, whose structures may be found for the first time through R@H? (some of my WU's do not show the native state or RMSD) Harder, optional: * What are the assumptions made by the R@H algorithm that reduce computational time/sample space for crunching? * Is the algorithm continually revised as results of crunching come in? Thanks a bunch to all who take the time to write back. ID: 61151 · Rating: 0 · rate: / Reply Quote

Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0	Message 61154 - Posted: 13 May 2009, 6:39:47 UTC - in response to Message 61150. * ...are we folding any proteins whose native states are not known, whose structures may be found for the first time through R@H? (some of my WU's do not show the native state or RMSD) * Is the algorithm continually revised as results of crunching come in? To answer your questions, it may be useful to clarify the purposes of Rosetta and Rosetta@home. Rosetta is a piece of software produced by the Bakerlab team at the University of Washington to help speed up the identification and analysis of proteins. It is licensed out to various public research groups (for free) and drug companies (for a fee) and is used in some cases to look for proteins that have potential uses as new drugs. The Rosetta software was used for at least one project by the World Community Grid, but other than that you will mostly find Rosetta running on private super computers with little or no public involvement. Rosetta@home is the development arm of the project. The Bakerlab team use the processing power of our computers to test out new theories about analysing proteins and refinements to the Rosetta code. The results of our crunching are analysed and changes that are judged successful are incorporated into the next version of the Rosetta software available to researchers. Other than the CASP blind-tests held every two years, R@H is not generally used to find unknown proteins. The work units you see with no native structure are most likely test-runs for what the team is going to try during the next CASP. So, in summary, Rosetta software is used on private networks/supercomputers to analyse new proteins. R@H is used to improve the Rosetta software to make the results of the private crunching quicker and more accurate. ID: 61154 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 61160 - Posted: 13 May 2009, 15:32:08 UTC You can get a feel for the number of proteins, and the number of models for each sent out for study by looking at the link on the homepage, at the left, under the returning participants section, where it says view your results. That would show the ones your machines have worked on, there's probably others that you didn't happen to get any tasks from. Replace the user ID in the URL with someone that has a high RAC and you will see there are lots of studies being done all the time. Much of the work done is against proteins of known structure. If you think about it, this is the only way to know if your algorythm is working. That, and work on proteins of unknown structure, but which will be determined soon (such as CASP). What are the assumptions made by the R@H algorithm that reduce computational time/sample space for crunching? I've never seen a clear, (and understandable to a non-scientist), explaination. But, basically, the Rosetta program has several different modes of searching. And so when tasks are created, the modes that would be most appropriate for the target protein and determined beforehand. The search space is basically explored with a Monte Carlo algorythm. Is the algorithm continually revised as results of crunching come in? I believe you are asking if there is a feedback loop used as results come in to plan creation of new tasks. No, the entire batch is created at once, and studied as a whole. If anything, if the results are not as desired, another batch with a different search mode (as described above) would be created to further study the protein. Rosetta Moderator: Mod.Sense ID: 61160 · Rating: 0 · rate: / Reply Quote