Message boards : Rosetta@home Science : TOO MANY ERROR MESSAGES WHY?
Author | Message |
---|---|
John Send message Joined: 24 Oct 06 Posts: 2 Credit: 1,863,033 RAC: 0 |
I HAVE RUN THE ROSSETA PROJECT CONTINNUSLY FOR A MONTH AND I HAVE TOO MANY ERROR MESSAGES 'Client error Compute error'. BEFORE I JOINT ROSSETA I WAS RUNNING SETI, I HAD ALSO SOME Client error Compute error MESSAGES BUT NOT THAT MUCH... I DIDN'T CHANGE ANYTHING ON MY PC WHY DO I GET SO MANY Client error Compute error ? CAN ANYONE HELP ME? |
Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0 |
First, can you stop shouting (capitals, in internet culture, is used to "shout"). Text is much easier to read if it's correctly capitalized. Now to your problem: Rosetta is probably a bit more sensitive to computer problems than SETI - it's doing a more complex set of math, simply put. However, there is also the problem that Rosetta can have "bad workunits", i.e. that the work-units themselves cause Rosetta to crash, so you could just be "unlucky". In at least one case, the other computer attempting to calculate the same work-unit also got an error, which indicates that the workunit itself is "broken". Here's some examples where two computers failed to do the same task: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=42022337 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=41972301 Here are some where the second attempt succeds: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=41705331 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=41704823 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=41703070 -- Mats |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
first, can i agree with mats about not shouting. in my opinion if for any reason you have difficulty with using the shift key (disabilitiy etc) it is more friendly to others to use all lower case instead of all capitals. that is what i am doing here, deliberately to make the point. of course, we can see it is your first post here, and accept that it takes time for newcomers to get to know 'how to eat pie with a fork' online. Second, again I agree with Mats about this project having more errors than (say) SETI or Einstein. It is important to know that this project is doing two things at once a lot of the time. One thing it is doing is making new predictions that (hopefully!) will be useful to other biochem researchers (both pure research and medical). The other thing it is doing is testing out new ways of crunching these predictions. Sometimes going into areas where programs do not exist at all, and sometimes trying to improve the speed or accuracy of existing programs. These aims mean that this project is always running reasonably newly written code. We do not have the luxury (as SETI and Einstein do) or running the same app for half a year at a time, changing only the input data. Those projects are each part of an experiment that involves collecting gigabytes of similar data and then sifting it (for aliens or for gravy waves). As far as SETI or Einstein are concerned, the IT is a tool towards their main aim (aliens or wavy gravy). Here, developing the IT is an end in itself - to develop better tools for other groups to use. Until revently, for example, an older version of the Rosetta code was in use on the World Community Grid, where it ran bug free for months, being an old tested version that was doing some production work on some interesting science. For that reason, on this project you almost always get credit for results that error out - because when you develop new code catching the bugs is part of the science. Now you understand better what we are about, you may choose to stay and enjoy being part of pushing forward the IT boundaries (accepting the higher error rate). Or you may choose to go to other kinds of project -- all of which have their own legitimate aims -- so that you can enjoy a lower error rate. The choice is yours. The only mistake would be to stay and hope the errors will go away - they won't, this is not that kind of project. Errors come in clusters (and we seem to be in the middle of a cluster now) but they will never go away for good. If you choose to go elsewhere, then thank you for the 3.5k credits you have donated to this project - not only have you given Rosetta a fair test but also the work you have crunched, including those that went wrong, have genuinely helped the project. If you choose to stay, we thank you for your willingness to understand the different needs of this project now that you have experienced those needs in action. So, welcome, or go well; as the case may be. River~~ |
Don Send message Joined: 28 Oct 06 Posts: 2 Credit: 294,270 RAC: 0 |
I am experiencing a lot of errors as well, but only on two of my four machines. 7 out of 13 results were errors on a P4 3.0 prescott on an Intel board not overclocked. 4 of 16 results were errors on a P4 3.0 northwood @3.3 GHz on an Abit board. no errors on 16 results on an Athlon 64 3700+ @2.8 GHz on a DFI board no errors on 16 results on a PIII 1 GHz on an Intel board. Funny that the hyperthreaded processors are the ones that are throwing errors. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
I am experiencing a lot of errors as well, but only on two of my four machines. Are you getting the same kind of work on all the machines? In the short term thais kind of pattern can arise on this project simply because the work tends to go out in batches, so that one machine can be hit by a load of errors while another gets none. If you are seeing a lot of errors on the HT boxes and none on the others on the same types of work unit (ie the words in the wu name are the same, only differing by the numbers), then that would be a valuable clue and I'd ask you to post details in the Problem with Rosetta Version blah thread. I won't post a link, as the right thread changes from time to time, but just put problem rosetta version 5.40 into the search box on this page and it will take you there (alter the number, of course, if you are on a different version of Rosetta). R~~ |
sslickerson Send message Joined: 14 Oct 05 Posts: 101 Credit: 578,497 RAC: 0 |
If you are seeing a lot of errors on the HT boxes and none on the others on the same types of work unit (ie the words in the wu name are the same, only differing by the numbers), then that would be a valuable clue and I'd ask you to post details in the Problem with Rosetta Version blah thread. Hi River I started the Problems with Rosetta Version 5.40 thread the other day. For those of you reading this now and have questions or concerns about the current application errors please click here . Thanks! Tim Edit: Spelling and other stuff that will drive me crazy if I don't fix it. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
I won't post a link, as the right thread changes from time to time, ... I'd emphasise that Tim's links refer specifically to v5.40, which is current at the time of writing but may not be by the time you read this. So please check the version number, and if it is not 5.40 then please use the search box to find the right thread. Thanks everyone River~~ |
Message boards :
Rosetta@home Science :
TOO MANY ERROR MESSAGES WHY?
©2024 University of Washington
https://www.bakerlab.org