Message boards : Rosetta@home Science : DISCUSSION of Rosetta@home Journal (2)
Author | Message |
---|---|
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
|
Aglarond Send message Joined: 29 Jan 06 Posts: 26 Credit: 446,212 RAC: 0 |
Why Rosetta, currently, does not use any optimization ? Hmm.. I was thinking about the same few weeks ago. But I've read some posts from Akos F. and he explained, that just compilig with 3Dnow or sse2 may increase the speed by only 3%. If you want bigger increase you have to do some low-level programming in assembler. This is that "magic" that can give you 600% increase in speed. Of course 3Dnow and sse2 can be strong tools, but not by itself. They has to be used the right way. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 42 |
I think suggestions such as... if you have 287 work units still remaining on your computer you can delete them. Please keep all others running! ... should be echoed in this thread so crunchers can subscribe to that thread and kill obsolete wu's promptly and get on with the newer targets. An aside, the "BakerBlog" now contains 3 months of posts, time for a "son of" BakerBlog? I believe the blog should be kept online though, perhaps on another page. It demonstrates more then anything, the commitment to communication shown by the Rosetta team, which I am sure is a part of the projects success. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Chris Send message Joined: 5 Jun 06 Posts: 1 Credit: 94,712 RAC: 0 |
The AP article on rosetta@home is out! See Ethan's post on the boards today. I think it turned out very well--what do you think? Lets hope lots of people see it. this is how i found out about it, my comp has been crunchin since i turned all the stuff on. |
Robert Everly Send message Joined: 8 Oct 05 Posts: 27 Credit: 665,094 RAC: 0 |
Are there other suggestions for feedback we could give? Certificates, etc. we could think about if people would like this, but we would certainly need this to be at least in part handled by a volunteer group as we are swamped with CASP. Printable certificates would be cool. I'm sure someone that knows some PHP (not me) could modify the Seti certificates for use here. Would be neat to have them available for milestones and top predictions. The seti certificates are located in the repository Here. They are Cert1, Cert2 & Cert3. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
I believe I know the answer, but I thought this thread might be a good place for one of you more bio-techie folks to explain the item in the release notes today that says: We can efficiently assemble predefined domains of the protein chain into a whole structure. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Keck_Komputers Send message Joined: 17 Sep 05 Posts: 211 Credit: 4,246,150 RAC: 0 |
Are there other suggestions for feedback we could give? Certificates, etc. we could think about if people would like this, but we would certainly need this to be at least in part handled by a volunteer group as we are swamped with CASP. I have always liked this idea and have long hoped that someone would program a BOINC standard certificate into the baseline code. It would still require the projects to produce image(s) but would make things much simpler for smaller/new projects. BOINC WIKI BOINCing since 2002/12/8 |
Christoph Jansen Send message Joined: 6 Jun 06 Posts: 248 Credit: 267,153 RAC: 0 |
I think it refers to this part of the research overview. The first parts of a protein that will form a defined structure will be amino acids very close to each other. If you take naturally occurring structures matching short parts of the sequence of the protein you want to model this may give you a clue of how the protein looks piecewise. The problem may however be to find an efficient way of connecting those small sequences to each other or to sequences in between to which no special structure has been assigned. There are a number of variables to be regarded as you will e.g. only want chemically valid angles and conformations to occur in your starting model or do not want parts to overlap in space. The overall asembly is like a 3d-jigsaw puzzle which needs to be solved. So what I think it basically says is that they have found a way to assemble those bits and unstructured parts together in a way that is fast as well as accurate and complies with physical and chemical prerequisites. "I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant." R.M. Nixon |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
...many proteins consist of multiple independently folded "domains". In many cases, it is possible to recognize from the amino acid sequence roughly where the boundaries between the domains are, and in these cases we carry out folding calculations separately on each domain. This in the end produces models for different parts of an amino acid sequence, and we then need to assemble these into one coherenet structure. For this we use a protocol again very similar to what you have been running, except that the only variation allowed is in the linker between the domains, typically around 10 residues, while the intradomain structure is kept fixed (this is quite analogous to the docking problem I mentioned above). ...attempted translation to laymen's English... It is sometimes useful to study portions of the protein rather than the entire chain. Some specific sequences will fold in a consistent mannar whenever they appear in the chain. By recognizing these, you essentially break the problem into several pieces, some pieces are the known shapes, and other portions you still don't know. You then work with the amino acids that exist between the known portions to fill in the complete chain. You may have seen this in the graphic of some of your work units. You could see gaps throughout the protein chain, and then blinks where they were filled in with various possible shapes and tested for the resulting energy levels. It's like putting together a puzzle now... where some of the pieces are made of stiff cardboard, and other pieces are moldable clay, in various sizes. If you can form the clay and make it fill the entire gap to the next piece, then you have created a possible solution to the puzzle. And since the pieces are moldable, other solutions are possible. The energy levels tell Rosetta which of the possible solutions is "best", or most likely to be the actual form the protein takes in nature (the "native state"). Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Curious, when crunching for CASP, where protein's native structure is unknown, how do you determine which of these "domains" to isolate and assume to take a given shape to use the JUMPING approach? Are there specific sequences that ALWAYS take the same shape? Or are the sequences chosen based upon preliminary results found from the traditional ab initio and full atom relax approach? Or, to use my puzzle analogy below, how do you determine where to place a stiff cardboard piece, and where to place the moldable clay? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Today I met with the people who design the science curriculum for Seattle Public School middle and high schools to discuss incorporating rosetta@home into middle and high school science classes. I think that participating in a real research project could be more inspiring than just learning a set of facts; I certainly never found science classes very fun or interesting--the exciting part is discovering new things more than learning about discoveries made long ago. This is GREAT news! And I hope you will find a way to incorporate "the exciting part" for the students. I mean if they presented chemistry by saying "we know the PH levels of each chemical... now let's DISCOVER what happens when they are combined"... and take the student THROUGH the steps that the originally scientists followed to learn these things in the first place... to the student it IS a new discovery then, and they aren't fully AWARE that it was a discovery made long ago. I hope you can find a way to present the basic structure of how the atoms comprise the protein and try and leave it up to them to guess the rest. In essence to reinvent the science you've been working on for years. It is just possible that in doing this, you give them the information required... without biasing them to YOUR approach to solving the problem, and they discover that new method of solving the problem that works better (just like that theoretical little girl in Korea that grows up with these things and sees the problem from a whole new angle). I mean it's like introducing the idea of a perpetual motion machine... but not telling them it's impossible to make one, and asking them to devise one. They may not be successful... but YOU may learn a lot from watching what they DO come up with :) Maybe give them a few weeks to think about it and then present information about your approach to the problem. This may give them the incentive to learn how to compute angles and cosigns, and statistics and biology and chemistry... show them how all of these branches of science are involved in the problem. And this will help them see what we're trying to teach them is important in the real world. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0 |
Why Rosetta, currently, does not use any optimization ? This may apply to SOME types of calculations, although in my experience, SSE or 3DNow! generally gives an improvement on a particular calculation around 2-3x, unless it's something like a massive vector calculation (typically, that's what SETI does - FFT's are pretty good at being optimized for performace). I've looked a little bit at what calculations there are in Rosetta, and there's no real obvious places where a SSE can be just slotted in to give a BIG boost of performance. There's certainly places that can it be used to improve performance by some amount, but there's no huge vector operations, just to give an example. And I very much doubt that 6x would be the result, unless: 1. I've missed something (rosetta is quite large, and I've been looking at the disassembly of the code, not the source-code). 2. Someone spends a HUGE amount of time hand-optimizing large portions of code. In view of the fact that the algorithms are still being changed for Rosetta, I don't think that it makes sense to spend large amounts of time optimizing small portions of it, and then have to redo the same optimization a little while later because the entire calculation changed. When it comes to optimizing for 3DNow! or SSE, it's hard to get good results purely by adding a compiler switch to the compile, it takes re-writing the code in assembler to get much improvement - compilers are often poor at choosing the right things to go into registers for auto-vectorization [assuming it's supported AT all by the compiler]. Just using SSE instead of x87 instructions doesn't actually give much improvement in general. Here's some (simple) benchmarks of the following: add a sequence of floating point values from a two large arrays, 256 elements at a time: fpu: 750 kcycles sse_scalar: 775 kcycles sse_vector: 460 kcycles sse_v_unroll: 430 kcycles 3dnow: 470 kcycles The total array-size is (1 << 17) elements, so (4 << 17) bytes, or 256KB, so two arrays fit well inside the L2 cache of my Opteron processor cache. The exact number of clock-cycles vary slightly between run to run, and the numbers are "best and worst removed" then averaged, over 15 runs in total. There are cases where 3dNow! is better, but in this case I think it suffers from having to do twice as many operations compared to the SSE vectorized calculations.
SiSoft Sandra's integer and floating point benchmarks are optimized to produce the higest possible results. That's all well and good, but it's not REALLY important exactly what numbers Boinc comes up with in it's results, as long as it's reasonably good at linearly giving a score that matches the speed of the machine. If you take the time to optimize the benchmark so that it gives a better score, would it actually give a fairer result? Probably not... -- Mats |
Leonard Kevin Mcguire Jr. Send message Joined: 13 Jun 06 Posts: 29 Credit: 14,903 RAC: 0 |
If this is correct then taking into consideration the below post by Mats Petersson. I agree with Akos, and I agree with Mats based from the point: Its a daunting task to hand optimize assembler to use SSEx instructions, and it is componded when these routines change often. However, the only people who know how much the routines change and if the routines that are changed alot -- are the developers. So to completely put the issue to the grave would be their input - If there is even a definied way for them to communicate this, that could provide useful information to make the final determination. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
I think by "internal benchmark", DB was referring to a set of calculations (a mini-WU if you prefer, doing e.g. time to perform 10 steps of full-atom-relax of the "1tul" protein vs a reference PC) which will be compiled into the base Rosetta.exe and will be used instead of BOINCclient.exe's own benchmark, so that an fpops-based credits system can be used. This way credit claims will be more "objective" as it will measure "real" work for the project (although one can always crack the Rosetta.exe and change it, since we're still talking initial-replication=1). Some BOINC project science apps might fit entirely in L2 cache, whereas others might be more dependant on memory (FSB) speed. In the latter case, a 3GHz and a 2GHz P4 might do about the same "real work" per CPU-hour. Using fpops, will probably cause some differences between points/CPU-hour among BOINC projects. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
[quote] I think by "internal benchmark", DB was referring to a set of calculations (a mini-WU if you prefer, doing e.g. time to perform 10 steps of full-atom-relax of the "1tul" protein vs a reference PC) which will be compiled into the base Rosetta.exe and will be used instead of BOINCclient.exe's own benchmark, so that an fpops-based credits system can be used. YES--we have this implemented, but it is not yet in use. we are instead thinking now of using average WU times from the RALPH tests to assign credits for each computed structure on ROSETTA, as suggested by participants earlier. For the duration of CASP the credit system will remain as it is now to avoid disruptions. |
R/B Send message Joined: 8 Dec 05 Posts: 195 Credit: 28,095 RAC: 0 |
Today I met with the people who design the science curriculum for Seattle Public School middle and high schools to discuss incorporating rosetta@home into middle and high school science classes. I think that participating in a real research project could be more inspiring than just learning a set of facts; I certainly never found science classes very fun or interesting--the exciting part is discovering new things more than learning about discoveries made long ago. Anyway, they were very interested and we should have some pilot projects in schools this fall. Outstanding...I forget who kicked off that idea but that is the kind of step that will pay off in the long run.... A wonderful step, Dr. Baker... Founder of BOINC GROUP - Objectivists - Philosophically minded rational data crunchers. |
catalin Send message Joined: 28 Jun 06 Posts: 4 Credit: 17,134 RAC: 0 |
We desperately need as much CPU power as possible for the next two weeks... Here's my problem: I already have a project from climateprediction running just fine on my computer, but every time I try to get some new work from Rosetta, Boinc Manager doesn't seem to react - I hit the update button, read the "scheduler request pending" message for three minutes and... that's it - no result, no new work downloaded, from last month... The other project is working just fine, as I said... so I can't blame the manager for it... What might be the issue...? Thanks in advance. |
Daral Send message Joined: 13 Jan 06 Posts: 13 Credit: 870,334 RAC: 0 |
Your computer is overcommitted, so it went into earliest deadline first (edf) mode. It stops getting new work until it thinks it can handle them all before their deadlines hit. If you want to work on rosetta, pause the climate model for a couple seconds, it will download rosetta wu's, then restart the climate model and it'll do them both. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
If you are looking for information to show your friends to help get them to crunch Rosetta too... the newsletter that was sent in May might be a good resource. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Might I suggest copying Dr. Baker's post into the project homepage news area? These all get copied and shown on Boincstats where other folks might see it and be interested. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Message boards :
Rosetta@home Science :
DISCUSSION of Rosetta@home Journal (2)
©2024 University of Washington
https://www.bakerlab.org