Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 44 · 45 · 46 · 47 · 48 · 49 · 50 . . . 300 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1670 Credit: 17,523,845 RAC: 23,480 |
rb_04_12_21176_20979__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_912410_4 And then i got hit with this one. hgfp_split2_562_fold_SAVE_ALL_OUT_916305_1 Sent Time reported/deadline Status 15 Apr 2020, 12:54:58 UTC 16 Apr 2020, 3:41:43 UTC Cancelled by server errors Too many errors (may have bug) Too many total results WU cancelled The sever cancelling a bad batch, sure, but to kill off the Task due to an error on the other system- as the problem could be with the system, not the Task (which appears to be the case this time around). Especially when it sent out the new Task after the Error Task had been returned. I've done quite a few resends previously without issue. Grant Darwin NT |
Ian Send message Joined: 12 Oct 07 Posts: 3 Credit: 2,611,432 RAC: 0 |
I'll check, but as far as I am aware, the connection goes straight onto the internet via a firewall. What I don't understand is that Rosetta is the only BOINC project that I have this issue with - I am connected to several. Do they not all share the same upload infrastructure (and acknowledgement message)? I could understand the behaviour if the ACKs were not getting through for any BOINC projects. Is there a particular port that the response comes through that is peculiar to Rosetta? It looks like my contributions for this host are not getting credited looking at the host average page - I see a flat line. Interestingly the contributions for my home machine do get through though, which would seem to point to something blocking the upload response somewhere. |
Raistmer Send message Joined: 7 Apr 20 Posts: 49 Credit: 797,293 RAC: 0 |
"Too many restarts with no progress. Keep application in memory while preempted." https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13811 |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1670 Credit: 17,523,845 RAC: 23,480 |
I'll check, but as far as I am aware, the connection goes straight onto the internet via a firewall. What I don't understand is that Rosetta is the only BOINC project that I have this issue with - I am connected to several. Do they not all share the same upload infrastructure (and acknowledgement message)? I could understand the behaviour if the ACKs were not getting through for any BOINC projects.Each project has it's own servers, and all use TCP/IP for connections. Is there a particular port that the response comes through that is peculiar to Rosetta?No idea on that one. My guess, is no. It looks like my contributions for this host are not getting credited looking at the host average page - I see a flat line.A Result has to be returned before it can be reported. And is has to be reported for the project to be able to Validate it, then allocate Credit. Interestingly the contributions for my home machine do get through though, which would seem to point to something blocking the upload response somewhere.Yep. It's an issue that only you with that system appear to be experiencing, so it's something to do with that particular system, or it's connection to the Rosetta servers. Can you get a cheap USB mobile modem (or know someone with one)? A lot of stuffing around to set up, but if you can get one for $20 just to use that to connect to the internet instead of your existing connection (and still a lot easier than taking the whole computer somewhere else) and that way you can see if it is somehow your system, or it's the internet connection you are using that's causing the issue. Grant Darwin NT |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The sever cancelling a bad batch, sure, but to kill off the Task due to an error on the other system- as the problem could be with the system, not the Task (which appears to be the case this time around). Especially when it sent out the new Task after the Error Task had been returned. I think you are taking it a bit too personally. A batch of WUs is not cancelled by the project when they have good reason to believe your attempt to crunch it will go better. They set up most WUs to do one additional try after a failure for the reason you mention, maybe the second attempt will go better. But, looking across the whole batch is the only way to make a decision about whether to withdraw the batch, and that has nothing to do with your current state on the WU. Rosetta Moderator: Mod.Sense |
crystalsys Send message Joined: 11 Aug 09 Posts: 8 Credit: 1,624,737 RAC: 685 |
Version 3.8 (? I'm not sure and log does not show the version) error and tie up I keep getting jobs running on a 3.X (?) application that hang at some point, and then the log shows something like this: 4/16/2020 11:05:45 AM | Rosetta@home | Task hgfp_het2_576_fold_SAVE_ALL_OUT_911081_124_1 exited with zero status but no 'finished' file 4/16/2020 11:05:45 AM | Rosetta@home | If this happens repeatedly you may need to reset the project. Resetting the project is not necessary, but aborting that task seems to be, unless you want to waste more CPU time on it. Is there a way to restrict which app versions you get? I've looked, can't seem to find it. I've not seen this with version 4.15 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
Version 3.8 (? I'm not sure and log does not show the version) error and tie up Upgrading to BOINC 7.16.5 makes that problem much less likely. |
crystalsys Send message Joined: 11 Aug 09 Posts: 8 Credit: 1,624,737 RAC: 685 |
OK, so I decided to take your advice, though I don't have any recent notices of a new version. In BOINC Manager (currently 7.14.2 x64) I clicked 'check for new version'. It came back and told me there wasn't one. I normally don't have the log window open, but I did because I was monitoring the stalled tasks. In THAT window, I got a message in RED saying there was a new version. Did the check again, NOW it says there is a new version. Hopefully they also fixed the bogus 'there is no newer version' message. Thanks! |
Keith Myers Send message Joined: 29 Mar 20 Posts: 97 Credit: 332,141 RAC: 1,223 |
That feature in the Manager does not work. You can check for the latest BOINC version on the BOINC download page. The latest is 7.16.5. https://boinc.berkeley.edu/download_all.php You can also restrict the applications you run by configuring a cc_config.xml file. https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1670 Credit: 17,523,845 RAC: 23,480 |
A batch of WUs is not cancelled by the project when they have good reason to believe your attempt to crunch it will go better. They set up most WUs to do one additional try after a failure for the reason you mention, maybe the second attempt will go better. But, looking across the whole batch is the only way to make a decision about whether to withdraw the batch, and that has nothing to do with your current state on the WU.That's just it. As near as i can tell, that batch of WUs wasn't cancelled by the servers (i've actually processed 3 others that were resends, and one initial issue with no problems), that Task was sent out to see if it was dodgy or not. But then the Server cancelled it anyway without giving me a chance to even process it. minimum quorum 1 initial replication 1 max # of error/total/success tasks 1, 2, 1 errors Too many errors (may have bug) Too many total results WU cancelledWhy would the server cancel that Task? Given the time and effort it takes to produce work, i'd have thought Tasks being cancelled before checking them out would be worth looking in to. Grant Darwin NT |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
[snip]
The previous failed attempt WAS enough checking it out. Tasks already downloaded are normally cancelled only if they haven't started. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1670 Credit: 17,523,845 RAC: 23,480 |
[snip]If that was the case, why resend it? The whole point of resending something, is to check it out. If it doesn't need to be checked out, it doesn't need to be resent. And as i posted, i've done 3 others of that type that had errored on other systems, without them being cancelled. Grant Darwin NT |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Human intervention is required to make the decision about whether there is a specific problem with one machine, or a more general problem with the WU batch. By the time the human had enough information to make that decision, some WUs of the batch were already out to a second host. Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,112,600 RAC: 19,835 |
Not sure if I should report this as a problem, but... On an Android phone I'm running 4 tasks and have another (varying) 3 or 4 waiting to follow. I've been reporting and receiving more tasks regularly. All sounds good. Trouble is, the Server Status page has been reporting no tasks available to download for at least a day. And the number of in progress tasks has been reducing steadily until a few hours ago and currently reads nil. Right now I have 7. I've certainly received and reported tasks since both read nil. Not complaining, obviously. Just reporting |
GoldenHat Send message Joined: 14 Apr 20 Posts: 3 Credit: 122,663 RAC: 0 |
Hello, I'm a newbie to Rosetta and got things set up and running ok. In the last two days I've noticed my laptop running this app in an odd manner. Instead of running at 100% CPU, it fluctuates between 33% and 100%, my fan turns on and off each time yet I have the settings set as default - 100% CPU time. I'm concerned because 1) It's slower to process the data, 2) It's wearing out my PC and I'm inclined to delete the app from the computer if this continues. I have a Toshiba Qosmio i7 Quad-core with 8 logical processors. It runs the CPU, GPU 0 and GPU 1 at full capacity, with CPU speed at 3Ghz with a base speed of 2.4Ghz. Any ideas how I can get it running flat at 100% rather than this fluctuation? When I started it was fine, just in the last two days it's gone funky. Thanks, Richard. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,070,320 RAC: 12,300 |
Hello, I'm a newbie to Rosetta and got things set up and running ok. In the last two days I've noticed my laptop running this app in an odd manner. Instead of running at 100% CPU, it fluctuates between 33% and 100%, my fan turns on and off each time yet I have the settings set as default - 100% CPU time. I'm concerned because 1) It's slower to process the data, 2) It's wearing out my PC and I'm inclined to delete the app from the computer if this continues. I have a Toshiba Qosmio i7 Quad-core with 8 logical processors. It runs the CPU, GPU 0 and GPU 1 at full capacity, with CPU speed at 3Ghz with a base speed of 2.4Ghz. What os are you running? Have you tried running the system monitor to see what processes are taking cpu time and maybe which processes are cutting in and out to cause the fluctuations? |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1670 Credit: 17,523,845 RAC: 23,480 |
Any ideas how I can get it running flat at 100% rather than this fluctuation? When I started it was fine, just in the last two days it's gone funky.In the last couple of days you have picked up more work from Seti, Some was run on the iGPU, the rest is on the Nvdia GPU. And the problem with how it was before, is that the system was producing nothing but errors here at Rosetta. if you set your Computing preferences to the following, things should settle down. Less errors & less fans starting up & slowing down continually. Computing Usage limits Use at most 100% of the CPUs Use at most 100% of CPU time When to suspend Suspend when computer is on battery (selected) Suspend when computer is in use (not selected) Suspend GPU computing when computer is in use (not selected) 'In use' means mouse/keyboard input in last 3 minutes Suspend when no mouse/keyboard input in last --- minutes Suspend when non-BOINC CPU usage is above --- % Compute only between --- Other Store at least 1 days of work Store up to an additional 0.02 days of work Switch between tasks every 60 minutes Request tasks to checkpoint at most every 60 seconds Disk Use no more than 20 GB Leave at least 2 GB free Use no more than 60 % of total Memory When computer is in use, use at most 95 % When computer is not in use, use at most 95 % Leave non-GPU tasks in memory while suspended (not selected) Page/swap file: use at most 75 % Grant Darwin NT |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
What is your setting for "Suspend when non-BOINC CPU usage is above --- %"? Perhaps you have other tasks popping in and consuming CPU, which is causing BOINC to snooze. Especially if the value is at the default (25%?) it can be easy for the various other tasks to exceed that (briefly). I would set it to 75% of higher. Don't worry, the BOINC tasks still have low priority. If you are also running work on your GPU, keep in mind that CPU is still required to service the active work on the GPU. I believe many set things to use at most some % of CPUs that leaves one core free to service the GPU work. I don't run GPU work, perhaps someone could reply with details on how to set up that arrangement. Rosetta Moderator: Mod.Sense |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1670 Credit: 17,523,845 RAC: 23,480 |
I believe many set things to use at most some % of CPUs that leaves one core free to service the GPU work. I don't run GPU work, perhaps someone could reply with details on how to set up that arrangement.I find it's best to reserve a core to support the GPU. If a GPU Task is running, it gets the CPU support it needs and it doesn't impact on the processing time of CPU Tasks that are running. If there is no GPU work being done, then the CPU core/thread is free to do CPU work. The app_config.xml file needs to go in to the Seti project folder. If installed on the C: drive C:/ProgramData BOINC/projects setiathome.berkeley.edu/app_coonfig.xml Make sure to use Notepad or similar to create or edit the file (NOT Word or Wordpad) <app_config> <app> <name>setiathome_v8</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> </app> <app> <name>astropulse_v7</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> </app> </app_config> Grant Darwin NT |
Admin Project administrator Send message Joined: 1 Jul 05 Posts: 4805 Credit: 0 RAC: 0 |
We are deprecating the 'rosetta_for_devices' app. The arm platforms have been added to the 'rosetta' application group. We will also be deprecating the minirosetta app and will soon have just the rosetta app. There are still some minirosetta jobs in our queue. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org