Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 58 · 59 · 60 · 61 · 62 · 63 · 64 . . . 300 · Next
Author | Message |
---|---|
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
EricM, Rosetta@home currently has so many new users that it's not keeping up with the demand for tasks. As for being paused mid-task while another BOINC project run, that's normal if you have more than one BOINC project providing tasks. Tasks close to their deadlines get higher priority to run, and tasks for the other project catch up on run time later. |
SolidAir79 Send message Joined: 5 May 20 Posts: 4 Credit: 2,123,173 RAC: 0 |
Getting some errors on a windows machine all similar Stderr message? <core_client_version>7.16.5</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_06_25_30642_30047_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 2 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_06_25_30642_30047_ab_t000__robetta.zip -frag3 rb_06_25_30642_30047_ab_t000__robetta.200.3mers.index.gz -fragA rb_06_25_30642_30047_ab_t000__robetta.200.12mers.index.gz -fragB rb_06_25_30642_30047_ab_t000__robetta.200.3mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2765475 Using database: database_357d5d93529_n_methylminirosetta_database [ ERROR ]: Caught exception: File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: -nan(ind) ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. </stderr_txt> ]]> Regards Alan |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
SolidAir79, That looks like an error in one of the input files for the workunit. If so, you can't fix the problem, and all other users who get copies of that workunit or any other workunit using that input file will have it crash the same way. |
SolidAir79 Send message Joined: 5 May 20 Posts: 4 Credit: 2,123,173 RAC: 0 |
Okay thanks must have a bad batch ! |
EHM-1 Send message Joined: 21 Mar 20 Posts: 23 Credit: 183,782 RAC: 0 |
EricM, Hi Robert, and thanks for the info. I added the second project at your suggestion, thanks for that as well. But the Rosetta pause occurred before that, and has several other times in the past couple months. So I still wonder why it's not finishing the current task. Eric system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
EricM, Looks like you need to give more details about not finishing the current task. If it includes leaving a CPU core idle, that's a problem. If it's switching to running another task instead, that's normal. |
EHM-1 Send message Joined: 21 Mar 20 Posts: 23 Credit: 183,782 RAC: 0 |
EricM, Below is a shot of the project properties in BOINC. I don't know of any way to see the status of the current work unit other than to let the screensaver start running and read it from the BOINC status messages that run before the project screensaver does. From the event log, here are the most recent Rosetta-related messages that contain anything other than "no tasks sent/project requested delay": 6/24/2020 7:16:42 PM | Rosetta@home | Result hbnet_surface_design3_0.7_SAVE_ALL_OUT_IGNORE_THE_REST_3yb2cw0d_949347_1_0 is no longer usable 6/24/2020 7:16:42 PM | Rosetta@home | Result miniprotein_relax2_COVID_SAVE_ALL_OUT_IGNORE_THE_REST_9db4ko0a_949448_2_0 is no longer usable 6/24/2020 7:16:42 PM | Rosetta@home | No tasks sent 6/24/2020 7:16:42 PM | Rosetta@home | Project requested delay of 31 seconds and the most recent Rosetta-related messages prior to the above: 6/23/2020 3:51:03 PM | Rosetta@home | Task hbnet_surface_design3_0.7_SAVE_ALL_OUT_IGNORE_THE_REST_3yb2cw0d_949347_1_0 is 1.74 days overdue; you may not get credit for it. Consider aborting it. 6/23/2020 3:51:03 PM | Rosetta@home | Task miniprotein_relax2_COVID_SAVE_ALL_OUT_IGNORE_THE_REST_9db4ko0a_949448_2_0 is 0.70 days overdue; you may not get credit for it. Consider aborting it. 6/23/2020 3:51:03 PM | Rosetta@home | URL https://boinc.bakerlab.org/rosetta/; Computer ID 3864355; resource share 100 6/23/2020 3:51:03 PM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 8050566; resource share 100 system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM |
Skillz Send message Joined: 24 May 17 Posts: 3 Credit: 5,914,356 RAC: 58,122 |
I am trying to attach two new computers to the project, but they fail every time. When attempted to add I get a "project failed to attach" and looking at the logs its claiming it can't reach the project servers. I can visit the rosetta@home web site using a browser on both computers I'm trying to attach so they're not blocked with a firewall or anything. Those BOINC instances are attached to other projects and I can get work from those other projects. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
EricM wrote: Below is a shotLink is broken; it redirects to a login page I don't know of any way to see the status of the current work unit other than to let the screensaver startAdvanced view > Tasks tab |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
EricM, Boinc needs to reprogram the scheduler so the project weight works properly. In particular if you change the weighting, it takes days to actually do what you asked. For example, I changed from Universe 0 LHC 0 Rosetta 1 to Universe 1 LHC 5 Rosetta 25 I would expect to immediately see 1 Universe to every 5 LHC to every 25 Rosetta tasks running, but I didn't, not for 3 days. Boinc went utterly mental and ran almost exclusively LHC, presumably doing some weird lookback over the last week and seeing it hadn't done any. When the user changes the weighting, it should have immediate effect. |
EHM-1 Send message Joined: 21 Mar 20 Posts: 23 Credit: 183,782 RAC: 0 |
|
ProDigit Send message Joined: 6 Dec 18 Posts: 27 Credit: 2,718,346 RAC: 0 |
12 CPU WUs hogging up my PC, using only 1 cpu core. I will for the time being disconnect from this project until the issue is resolved. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
12 CPU WUs hogging up my PC, using only 1 cpu core. What issue? I've got 6 computers running Rosetta - two of them with 24 cores each. All cores utilised as normal. What's happening on yours? Are there tasks that say running but doing no calculations? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
EricM, I turned that nonsense off. Go into Boinc's properties and change the "switch between applications" to a huge number. I set mine to a year. I do not want stuff changing before it's finished. |
EHM-1 Send message Joined: 21 Mar 20 Posts: 23 Credit: 183,782 RAC: 0 |
Thanks for the tip, Peter. I switched it from 120 minutes to 1,000 for now to see what happens, double what I've observed as the process time for a Rosetta task. And now I'm suspending my second project temporarily to see if Rosetta resumes the task when the screensaver kicks in. Eric system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
That setting will still annoy you every 17 hours. I think it means to change every 17 hours, not 17 hours since the last task started. Hence I changed mine to a year. And it doesn't apply if you: Restart the machine. Pause tasks to play a game etc. Have another project go into high priority panic mode due to a late task. I mainly changed mine because I run LHC, and their tasks don't checkpoint very well and can sometimes get corrupted or at least lose a lot of work. But also I detest having hundreds of half done work units - especially when I see one with 1 second (!) left to go, which it doesn't get round to doing for a whole day! Boinc programmers aren't right in the head. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
@EricM: It seems that BOINC is having trouble deciding how much work to download for your computer. As I understand it, that decision is based in part on what BOINC has seen the computer complete in the past. Does the machine have an irregular usage pattern? (Powered off frequently/irregularly? Variable amount of other work being done while BOINC is running?) The missed deadlines and high average turnaround time (2.81 days: only just within the 3-day deadline) may be contributing. Check all your Computing preferences. Post values/screenshots here, and we might spot something amiss. Regarding pauses: do you have any restrictions in your Daily schedules settings? To try to get some tasks, try increasing Store at least N days of work. Do it in steps: add around 0.4 (slightly more than one 8-hour task time), save, wait a couple of minutes for BOINC to contact the server, and see if it downloads some tasks. If not, repeat. As soon as you get some tasks, reduce N again to maybe 0.3 (slightly less than one task) to avoid your machine getting flooded with work it cannot complete until BOINC learns better how long each task will take. Set Store up to an additional to 0, as at this stage the last thing you need is BOINC using poor estimates to opportunistically download even more work. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
EricM, If you're using the Simple View, click on View near the top of the window, then Advanced View to show more information. When you want to go back to Simple View, click on View, then Simple View. In Advanced View, click on Tasks to see a list of all the tasks currently on your computer. Some will show as Running, some as Waiting to run (started but not currently running; waiting for its next turn for CPU time), and some as Ready to start. There are also a few less common conditions you don't see as often. Those in the Running condition will have time advancing in the Elapsed column, not always every second. They should have time decreasing in the Remaining column, but it can be increasing instead if the initial guess at how long it will run is sufficiently less than accurate. The Deadline column shows when the task must be finished and returned to avoid problems. For about one day before the deadline and for some time after the deadline, any tasks that finishes will upload its outputs and report the finish automatically. Any tasks finishing earlier than that may or may not wait. If you need to speed up an upload, click on Transfers, then some line for a file, then Retry now. This starts an attempt to upload all of the files going to the same BOINC project as the file you clicked on. The Status column shows whether the upload was blocked (usually temporarily). Generally, if your BOINC Manager contacts the project server for any reason, it will then try to do any uploads and reports that are waiting. If you need to speed up reporting a finished task, click on Projects, then the BOINC project the task is for, then Update. It should then try to report all finished task for that project, except any that are still waiting to finished their uploads. To see the main event log, click on Tools, then Event Log. The main event log should then appear on the screen until you click Close at the bottom corner. Your no longer usable messages indicate that the task is enough past its deadline that another task from the same workunit has been send to another user and that user has sent back the upload files and reported the task as finished, so the server no longer needs anything from your task any will not give you any credit for it. Your no tasks sent message indicates that either there are no tasks available to send you, or the server has decided that your computer is not reliable enough to be worth sending any tasks for a while. Your Project requested delay message indicates how long your BOINC Manager should wait before trying again. This is to prevent overly frequent requests from blocking access to the server for other users. Your overdue messages indicate that the tasks is past their deadline, enough that you are unlikely to get any credit for returning them. There is also a separate log file for each task. You might check if you have Task Manager installed. I often use it to show problems with too many tasks trying to run at once, or not having enough memory to keep all of the tasks running. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
[H]Skillz, Are you using this link when you try to attach? https://boinc.bakerlab.org/rosetta/ Note the https instead of the previous http. If not, delete what's currently in the Project URL box, and put this link there instead, before clicking Next. If this doesn't make it work, give us more details about what version of BOINC you are using under what version of what operating system (most users use the Windows operating system). When you enter a message, then click on Post Reply, you seldom need to enter it again. Try waiting about one minute for the server to show the message first. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
Peter Hucker, [snip] Boinc needs to reprogram the scheduler so the project weight works properly. In particular if you change the weighting, it takes days to actually do what you asked. For example, I changed from That would interfere with the way it recovers from times when one of the projects has no tasks available to send. Instead, it looks back over the last few weeks, and tries to get tasks from whichever project would move it toward the new weighting. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org