Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 65 · 66 · 67 · 68 · 69 · 70 · 71 . . . 300 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,238 RAC: 19,699 |
In your account, Rosetta@ home preferences, Resource share.What's easiest is to set Rosetta at say 50%, WCG at 25% and some orther project at 25% andlet Boinc figure it out,which it will do over time.Just besure to keep your cache sizes small so you don't run into deadline problems. With Rosetta's 3 day deadline if you have 3 days of work NO other projects will crunch because their deadline will be further out than 3 days.Where are these resource share settings hidden? Yes, it isn't a %age. I've seen someone's now pointed out where the setting is at WCG, but I could never find it before, so I just increased Rosetta to 2900. Amounts to the same thing |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,238 RAC: 19,699 |
I've noticed the same: Tasks are arriving with an 8 hour estimated completion time. Definitely, yes. During the last outage I increased my runtimes to 12hrs to eke my last few out, and they ran for 12hrs, but when new tasks came through, the unstarted ones still showed 8hrs. I've reduced my runtimes back to 8hrs. Boinc has enough trouble with scheduling without me or Rosetta making it worse. |
Stevie G Send message Joined: 15 Dec 18 Posts: 107 Credit: 822,669 RAC: 1,625 |
[quote] 8Gb RAM ought to be plenty for a 2-core processor. Have you looked at the previous advice in this thread and compared to your own settings (even though the advice was for a different machine)? There should be plenty for you to consider. Boinc <ought> to be able to give your other projects enough time to complete before their deadlines without you having to suspend them. The longer you can run without interfering, the better Boinc will be able to decide for you.[quote] For some reason, the computer shut down and was unresponsive for 48 hours. No action from the power button, hard drive, etc. Nada, nichts, zip. Power cable was OK, I don't think there's an inline fuse, so I dunno. There's a reset button on the power supply, but I didn't mess with that and the button is not popped out. Overheat? Usually, that just results in a restart. To be safe, I just vacuumed out all the accumulated cat hair and dust. We had some thunderstorms here last night, so maybe there was a power interruption. But nothing else in the house was affected and this machine is on a UPS backup, which did not register any action. A deep mystery But when I just now turned it et Voila!! It awoke from its coma. Which is how I'm writing to you at this moment.{:>) No explanation for that, but I'll take it. However, I've been out of business for more than two days, with deadlines rapidly approaching. So I will take your suggestion under advisement and scrutinize my settings and preferences. Thanks again. SGaber |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
[snip] For some reason, the computer shut down and was unresponsive for 48 hours. No action from the power button, hard drive, etc. Nada, nichts, zip. The shutdown is typical after a momentary loss of power. The UPS may have let its battery or batteries run too low. For example, if its rating was too low for your computer. If so, it would eventually recharge it or them after long enough with the computer using no power. You may have needed to unplug it to keep it from being confused about whether it was still running. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,238 RAC: 19,699 |
8Gb RAM ought to be plenty for a 2-core processor. That's not a great sign. It's quite an old PC and must have done a lot of work in its time. The best thing you've done is vacuum it out, because it sounds heat-related to me and you'll have helped it run cooler by getting rid of the junk, which will extend its remaining life. I'm actually in much the same situation myself and considering what my next PC should be within my budget. |
justsomeguy Send message Joined: 24 May 17 Posts: 1 Credit: 375,643 RAC: 0 |
Recently, I started seeing a lot of jobs completing with a status of "aborted by project". They were completed prior to the deadline, but it doesn't appear that I get any credit for them either. Any ideas/thoughts on this? |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,222,776 RAC: 4,804 |
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1092259599 Both tasks errored out after just a few seconds. Slightly different error codes but the same "upload failure": </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>TFSCAFFOLD0001_6_SAVE_ALL_OUT_IGNORE_THE_REST_0ub6wd0j_953357_1_1_r1180454695_0</file_name> <error_code>-240(stat() failed)</error_code> </file_xfer_error> </message> ]]> </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>TFSCAFFOLD0001_6_SAVE_ALL_OUT_IGNORE_THE_REST_0ub6wd0j_953357_1_0_r1298488601_0</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> </message> ]]> EDIT: Got another FSCAFFOLD0001 WU, also errored after just a few seconds. Bad batch? https://boinc.bakerlab.org/rosetta/result.php?resultid=1217236062 </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>TFSCAFFOLD0001_2_SAVE_ALL_OUT_IGNORE_THE_REST_1xl5lk3f_953353_2_0_r1523244009_0</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> ]]> |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
Recently, I started seeing a lot of jobs completing with a status of "aborted by project". They were completed prior to the deadline, but it doesn't appear that I get any credit for them either. Usually done only if your computer has downloaded them but not started on them yet, but can be done even if started or completed but not returned. You may need to try harder to return completed tasks. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1092259599 Very fast failures usually mean that not all of the expected output files were produced, and therefore those files were not available to upload, |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,222,776 RAC: 4,804 |
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1092259599 Well, 3rd TFSCAFFOLD task the errors out. Good thing they fail quickly, whatever is causing it. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,526,840 RAC: 23,319 |
Recently, I started seeing a lot of jobs completing with a status of "aborted by project". They were completed prior to the deadline, but it doesn't appear that I get any credit for them either.The only similar errors i could find were "Cancelled by server", and none of them were cancelled before your system started to process them. No work done, no Credit. Grant Darwin NT |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
Very fast failures usually mean that not all of the expected output files were produced, and therefore those files were not available to upload, How can it have got to the uploading stage if it's only just started? Well, 3rd TFSCAFFOLD task the errors out. Good thing they fail quickly, whatever is causing it. Hopefully the server gives up and only tries sending them to several people before putting them in a "fix this" box for the programmers. I have also noticed my Boinc client backing off and not trying to get Rosetta tasks if it's just had a few failures. Universe and LHC tasks coming in more often just now. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
Very fast failures usually mean that not all of the expected output files were produced, and therefore those files were not available to upload, I'd expect that only if an error occurred it some point where the error output wasn't going to either of the output log files the users are able to see, which seems to be what happened to most of the TFSCAFFOLD tasks my computer tried to run. Well, 3rd TFSCAFFOLD task the errors out. Good thing they fail quickly, whatever is causing it. I've found a thread for a moderator's attention, and asked the moderator to check this thread. Those of this type that I've looked at were set to have the server give up on the workunit after two failed tasks. |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,222,776 RAC: 4,804 |
Very fast failures usually mean that not all of the expected output files were produced, and therefore those files were not available to upload, I saw, thanks. Had 1 or 2 more fail but am now running 1 just fine, as others have reported. |
Stevie G Send message Joined: 15 Dec 18 Posts: 107 Credit: 822,669 RAC: 1,625 |
After a week of working interspersed with total shut-downs I finally solved the problem. It was a faulty power supply. I installed a new heftier (600W) power supply and a more powerful fan. The machine has been crunching non-stop for three days. YAAAYY! The exhaust air is much cooler. According to the CoreTemp utility, the CPU is running between 42 and 54 degrees C. It's also quieter and apparently happier. Now if Rosetta would send me some WUs, that would complete my week. Thanks for your support and patience. Cheers, Steven Gaber Oldsmar, FL |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,238 RAC: 19,699 |
Good news - it could've easily been something much more expensive |
Stevie G Send message Joined: 15 Dec 18 Posts: 107 Credit: 822,669 RAC: 1,625 |
Yes, but I think at that point, say a defective motherboard or CPU, I would have just gotten another computer. It's like trying to keep an old car running for another year. This one was barebones box that I filled with the parts for around $550. Now that it's working again, I think I will put another 8 GB of RAM in it. There are some really inexpensive refurbished Dell and HP computers out there, starting at $200. Anybody ever try one of those? Steven Gaber Oldsmar, FL |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
rgmjp tasks running way longer than 8 hours: 1220528042 · 1220528132 · 1220528339 I’ve got another couple still running after nearly 16 hours, and a few more in the pipeline… |
Ray Murray Send message Joined: 22 Apr 20 Posts: 17 Credit: 270,864 RAC: 0 |
This one was just a smidgeon over 23hrs. Not a problem for me as my hosts run 24/7 and I have "Switch between" set beyond 2 days (to allow an occasional long LHC virtual task to run to completion without interruption) but I don't know how it would have fared on a machine that only runs 8hrs a day or if it was switched out too many times. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
The rgmjp tasks appear to complete only one decoy. The first decoy is usually only a quick check to make sure that your computer is running properly, so does this mean that the usual first decoy is skipped for these, or does it mean that more decoys are done but without adding them to the decoy count? |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org