Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 68 · 69 · 70 · 71 · 72 · 73 · 74 . . . 300 · Next
Author | Message |
---|---|
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
Slightly side-tracking.I've got one running right now. 1 day, 5 hours, 40 minutes of CPU time: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1095559368 My wingman completed it in 13 hours, but so far I've taken 1 day, 5 hours, 40 minutes. The wingman's computer has an i5-6402P, which I've never heard of, but if it's a similar speed to an i5-6400, then it's a similar speed to my Xeon per core, so I'm not sure how he did it so quickly. How does winging work with Rosetta? Can't you end up with one guy doing more modules than another because his computer is faster? |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
How does winging work with Rosetta?It doesn’t. Tasks are typically not sent to more than one machine. Yours did probably only because its deadline has passed. If your machine does ever finish it, you will get the same credit as the other user. (Looking at the FLOPS: his machine is 30% faster than yours.) And yes: this is where BOINC’s credit model (designed for fixed work / variable time) breaks down on Rosetta (fixed time / variable work). (Explanation from Mod.Sense.) |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
Ok that answers one of my two questions, but.... how did he finish it so quickly? I can only assume his CPU, although similar in a benchmark, is faster at Rosetta. Back to the question you answered - I take it Rosetta is programmed such that it cannot send back a wrong result? Most projects have to check with at least one other person to make sure you got the answer right.How does winging work with Rosetta?It doesn’t. Tasks are typically not sent to more than one machine. Yours did probably only because its deadline has passed. If your machine does ever finish it, you will get the same credit as the other user. (Explanation from Mod.Sense.) |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
I edited while you were replying… Looking at the stats: his machine is 30% faster at floating point ops, and 80% faster at integer ops, than yours. Using those numbers, yours should take somewhere between 17 and 25 hours. But that the task is still not finished after 30 hours suggests it’s not that simple… From what Mod.Sense wrote, Rosetta would rather have two machines doing two different tasks than both doing the same and comparing results to ensure they’re ‘right’. I’m not sure there’s really such a thing as a ‘wrong’ answer with Rosetta anyway, if the tasks are simply asking: “What if…?” Any results that look promising will be investigated further, and can be discarded if they turn out to be somehow erroneous. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
Looking at the stats: his machine is 30% faster at floating point ops, and 80% faster at integer ops, than yours. Using those numbers, yours should take somewhere between 17 and 25 hours. But that the task is still not finished after 30 hours suggests it’s not that simple…Where did you get the data from? I usually compare using http://cpuboss.com/compare-cpus but that has not heard of his CPU. I tried searching for a few more comparison sites, but the ones that list his don't have benchmarks, they just list all the specs side by side. From what Mod.Sense wrote, Rosetta would rather have two machines doing two different tasks than both doing the same and comparing results to ensure they’re ‘right’. I’m not sure there’s really such a thing as a ‘wrong’ answer with Rosetta anyway, if the tasks are simply asking: “What if…?” Any results that look promising will be investigated further, and can be discarded if they turn out to be somehow erroneous.But if a computer makes a mistake it will miss what could be an interesting combination. There must be some kinda CRC check in the programming. Astrophysics projects use at least two machines, as the answer can be incorrect. And yes: this is where BOINC’s credit model (designed for fixed work / variable time) breaks down on Rosetta (fixed time / variable work). (Explanation from Mod.Sense.)It only breaks down when someone returns it too late. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Where did you get the data from?I was just looking at the Measured floating point speed and Measured integer speed values on each Computer Details page, which come from the Whetstone and Dhrystone benchmarks that BOINC runs. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
[snip] There must be some kinda CRC check in the programming. Astrophysics projects use at least two machines, as the answer can be incorrect. It depends. If the project is searching a very large set of starting points that should all give answers converging to the best possible answer, and the server can quickly evaluate the quality of what was returned, that a few wrong answers aren't important enough to reduce the number of starting points that are evaluated. On the other hand, I've seen a BOINC project where nearly all of the tasks returned answers saying nothing was found. Someone noticed this, and wrote a fake application program that always returned nothing was found, without even checking if there was anything that should have been found. The project had so few users that each workunit went to only one computer, except after timeouts and obvious errors. this means that the fake results were only noticed after someone noticed that the fake application used less than 1% of the CPU time used by the real one, and by then so many of the fake results had been declared valid and the run time deleted that a large number of workunits had to be recreated and run again. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
On the other hand, I've seen a BOINC project where nearly all of the tasks returned answers saying nothing was found. Someone noticed this, and wrote a fake application program that always returned nothing was found, without even checking if there was anything that should have been found. The project had so few users that each workunit went to only one computer, except after timeouts and obvious errors. this means that the fake results were only noticed after someone noticed that the fake application used less than 1% of the CPU time used by the real one, and by then so many of the fake results had been declared valid and the run time deleted that a large number of workunits had to be recreated and run again.I shake my head in disgust at who would do such a ting, it's not even as if you can make money out of getting more credits. |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
Where did you get the data from?I was just looking at the Measured floating point speed and Measured integer speed values on each Computer Details page, which come from the Whetstone and Dhrystone benchmarks that BOINC runs. Those numbers are anything but accurate. My hilariously thermally constrained Macbook from 2015 has a measured floating point speed of 5.65GFLOPs (it can go above 6.10GFLOPs sometimes, which is way higher than a well-cooled i9-9900K). That is faster than my Ryzen 3600 and many current gen high-end desktop-grade CPUs from Intel. There is no way that can be true, integer performance seems to match up to expectations, though. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,527,680 RAC: 23,122 |
Ok that answers one of my two questions, but.... how did he finish it so quickly?My understanding is that for a given Work Unit, each Task actually starts with a different random seed. So while the data for 2 (or more) Tasks from a given Work Unit is the same, the starting seed value(s) are different, and so the entire calculation work done can be significantly different- even though the data being processed is the same. Hence why there is no comparison of results involved in Validation of work done. I could be wrong of course. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,527,680 RAC: 23,122 |
Cheating by some people in the original Seti project was the reason BOINC was developed- Credits instead of just counting the number of Work Units processed, and a method for comparing results to see if a returned result is actually Valid or not.On the other hand, I've seen a BOINC project where nearly all of the tasks returned answers saying nothing was found. Someone noticed this, and wrote a fake application program that always returned nothing was found, without even checking if there was anything that should have been found. The project had so few users that each workunit went to only one computer, except after timeouts and obvious errors. this means that the fake results were only noticed after someone noticed that the fake application used less than 1% of the CPU time used by the real one, and by then so many of the fake results had been declared valid and the run time deleted that a large number of workunits had to be recreated and run again.I shake my head in disgust at who would do such a ting, it's not even as if you can make money out of getting more credits. Grant Darwin NT |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,070,320 RAC: 12,300 |
Where did you get the data from? I usually compare using http://cpuboss.com/compare-cpus but that has not heard of his CPU. I tried searching for a few more comparison sites, but the ones that list his don't have benchmarks, they just list all the specs side by side. Try :- https://www.cpubenchmark.net/cpu_list.php |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
So far as validating results, lost results etc. Each protein study fires off thousands of tasks. Some 5% or less of those results will look to be the best. If a task ran astray out in the wild, and mistakenly reports a terrible result, that's not ideal, but there should still be a similar result in those top 5%. If the task ran astray and mistakenly reports a fantastic result, that single model is rerun in the lab and confirmed. If the lab system has the same flaw, it should get the same fantastic result. But there is also human review of the results. Sometimes you can tell, just by the shape of the result, that it doesn't look like a protein found in nature. If a protein-protein interaction were being studied, it might be more difficult to tell that something is off just by the shape. Eventually results may by sent to the "wet lab" where they produce the two proteins and see if they actually interact as predicted by the model. If the protein structure has already been determined, the models are compared to the known structure and the degree of their similarity is measured in RSMD. Sometimes the human review of the top 5% of the results concludes that we still have not found the best model. Perhaps there is a high variability in appearance across the top scoring models. In such cases, variations of those top 5% of the results are sent out as a new round of work. It is for the same protein, and again will do thousands of models, but these will start with some assumptions or rules that cause you to begin with something much closer to one of those previous best results, and search around that same area for a better (lower energy) result. I made up the 5% number. 1% or less is probably more realistic. Maybe I should have said something like "...the top 10 or 20 models". Anyway, I hope that makes it more clear why R@h does not require a wingman to rerun the same models to confirm results. When you get down to those top 10 results, they should all look pretty similar. Each arrived at that model from a different start, but, in the end, the top results should all be similar to the actual protein's structure in nature. So, they should all be very similar. So if the 11th top result looks radically different due to some error, it will stand out like a sore thumb. Rosetta Moderator: Mod.Sense |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
So far as validating results, lost results etc. Each protein study fires off thousands of tasksExcellent description, thanks, it's nice to know how the system operates that we're running. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,753 RAC: 19,563 |
if it says "36000s + 14400s" that indicates the watchdog has now been set back to 4hrs rather than 10hrsThe 4 hours I took to be the run time preference (per this post); the 10 hours the watchdog (per this post). Got it. It's been so long since I needed to look at task overruns I must've completely forgotten the syntax. Made worse by the task runtime being 4+10 rather than 10+4. If it was 8+watchdog I wouldn't have confused myself so easily (I hope) |
Jord Send message Joined: 16 Sep 05 Posts: 41 Credit: 204,120 RAC: 0 |
When you made the 4.20 app for Windows, did you add the code (via the BOINC API) that checks every 10 seconds if the client has died and will then auto-exit the app? During testing something with BOINC/BOINC manager I find that when I kill BOINC Manager about 15 seconds after it starting up, while Rosetta tasks are still loading into memory, that both BOINC and BOINC Manager exit normally but the Rosetta tasks that started stay in memory. Even after a handful of minutes these apps still run. I have to manually kill them. Restarting BOINC Manager will only cause the tasks that started already to stay in memory and in BOINC Manager these show as "waiting to acquire slot directory lock. Another instance may be running." |
Corgi Send message Joined: 19 Jun 19 Posts: 5 Credit: 2,320,781 RAC: 5,033 |
Perhaps you can help me adjust my settings - I've been getting Rosetta tasks with deadlines that would require me to walk away from my computer and not use it for anything else to ensure completion - for example, I just recontacted the project to clear two sadly-unfinished tasks with more than a day yet to run that were due two days ago. A lot of what else I do is resource-intensive, so I have to pause BOINC and F@H while they're running. I hate seeing these tasks I can't complete! Suggestions, please? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
Perhaps you can help me adjust my settings - I've been getting Rosetta tasks with deadlines that would require me to walk away from my computer and not use it for anything else to ensure completion - for example, I just recontacted the project to clear two sadly-unfinished tasks with more than a day yet to run that were due two days ago. A lot of what else I do is resource-intensive, so I have to pause BOINC and F@H while they're running. How many cores do you have? Can you run your intensive tasks and a smaller number of Rosettas at once, by limiting Boinc to use less cores? Or leave the computer on more when you're not using it? |
Ray Murray Send message Joined: 22 Apr 20 Posts: 17 Credit: 270,864 RAC: 0 |
Hi Corgi, Running Boinc and Folding together can cause resource conflicts. Boinc can't see that Folding is using 1 (light), 3 (medium) or all 4 (full) cores so Boinc will, itself, try to use those cores as well causing Folding, Boinc and anything else you're trying to do, to slow down. You could set Folding to light, to use only 1 core and Boinc to use 3 of 4, 75%, or medium, 3 core and limit Boinc to 1 of 4, 25%. Or maybe set Folding to light, Boinc to 50% or 25%, leaving 1, or 2 cores free. I've noticed with Folding, if set to medium, 3 cores, before a task starts, you can turn it down to light, 1 core, and back up to medium later, but if a tasks starts in light, 1 core, turning it up to medium has no effect and it will run as 1 core to the end of that task. Hope that helps. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org