Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 72 · 73 · 74 · 75 · 76 · 77 · 78 . . . 300 · Next
Author | Message |
---|---|
Aravah Send message Joined: 12 Apr 20 Posts: 6 Credit: 1,101,172 RAC: 0 |
Thanks for your feedback. For clarification the two jobs are as far as I can tell unrelated but have very different credit scores - I did not mean to imply they were first and second runs of the same task. I know some projects send out jobs to multiple computers, I did not know if Rosetta was one of these. |
Aravah Send message Joined: 12 Apr 20 Posts: 6 Credit: 1,101,172 RAC: 0 |
tx |
bormolino Send message Joined: 16 May 13 Posts: 4 Credit: 160,977 RAC: 0 |
I'm still having issues with the graphics on Ubuntu 18.04. It shows "No shared mem". |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,222,776 RAC: 4,804 |
Seeing some tasks with a large log with these lines: AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. [ ERROR ]: Caught exception: File: ......srcprotocolsmotif_graftingmoversMotifGraftMover.cc:537 For this scaffold there are not suitable scaffold grafts within your constrains ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. https://boinc.bakerlab.org/rosetta/result.php?resultid=1263012644 https://boinc.bakerlab.org/rosetta/result.php?resultid=1263011946 In those examples, one had the number of decoys at the end and the other one didn't. They are validating but I sure hope this isn't wasted electricity. epcam_breaker_graft_v1_SAVE_ALL_OUT_IGNORE_THE_REST_2lt3jd5h_1009432_4_0 pdl1_graft_v1_SAVE_ALL_OUT_IGNORE_THE_REST_7es6gq8a_1009506_4_0 EDIT: On another PC https://boinc.bakerlab.org/rosetta/result.php?resultid=1262076666 https://boinc.bakerlab.org/rosetta/result.php?resultid=1262076045 Seems limited to epcam_breaker and pdl1_graft tasks from what I can tell. |
Detto Send message Joined: 10 Apr 20 Posts: 2 Credit: 788,565 RAC: 0 |
For the 3rd time since April I only got 3 credits for a work unit : https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1130847248 any insights? |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,527,680 RAC: 23,122 |
For the 3rd time since April I only got 3 credits for a work unit :Nope. The system completed a Task of exactly the same type without issue. Is there only 1 instance of BOINC running on that system? The difference between CPU time and Runtime indicates the system is doing a fair bit of work other than processing BOINC Tasks, but it's nowhere near as big a difference as other systems that aren't having low Credit issues. <core_client_version>7.16.11</core_client_version> <![CDATA[ <stderr_txt> command: rosetta_4.20_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers.index -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers.index -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 5 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -beta 1 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip Norn_pssm_struct_profile_layered_design_less_IVYW_wt1_091_c1__0.05_0018_8ffb15d87a6b0ee88cff77a7acba3bea_BJH8LOZG_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2711465 Using database: database_357d5d93529_n_methyl/minirosetta_database ====================================================== DONE :: 1 starting structures 28683.4 cpu seconds This process generated 133 decoys from 133 attempts ====================================================== BOINC :: WS_max 4.60468e+08 11:45:31 (22004): called boinc_finish(0) </stderr_txt> ]]> <core_client_version>7.16.6</core_client_version> <![CDATA[ <stderr_txt> command: rosetta_4.20_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers.index -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers.index -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 5 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -beta 1 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip Norn_pssm_plus_struct_profile_091_c1_barrel6_3_c0851511e59b6__0.05_0009_8c3dcb16fc078c91ae0a41d4b95a66fc_M0Z4KTXS_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2841497 Using database: database_357d5d93529_n_methyl/minirosetta_database ====================================================== DONE :: 1 starting structures 28729.4 cpu seconds This process generated 130 decoys from 130 attempts ====================================================== BOINC :: WS_max 4.43654e+08 19:49:09 (1228): called boinc_finish(0) ====================================================== DONE :: 1 starting structures 28916.2 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== BOINC :: WS_max 2.25206e+08 19:58:00 (3178): called boinc_finish(0) </stderr_txt> ]]> It's the usual cause- the Task finished, and yet continued to run and produced one more Decoy from another starting structure. That processing time was added to the earlier processing time, but that final Decoy wasn't, so added to the previous Decoys produced, so Credit was granted based on that one Decoy, and none of the previous work. Grant Darwin NT |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,222,776 RAC: 4,804 |
DELETED |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,222,776 RAC: 4,804 |
"Validation error" No idea what it was. https://boinc.bakerlab.org/rosetta/result.php?resultid=1263056236 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
"Validation error" The other task from the same workunit also failed. Therefore, an error in one or more of the input files is a likely cause, even though the stderr output the two task said nothing very useful about just what the error was. |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,222,776 RAC: 4,804 |
Yes, but it was an Unhandled Exception error at the start. I used to get some of of those related to my machine a while ago. Mine ran for a lot longer with no apparent errors. Just thought I'd report here. |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,222,776 RAC: 4,804 |
"Validation error" Another one https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1131803173 Just "validation error". Nothing seemingly wrong on the log. |
sph Send message Joined: 27 Mar 20 Posts: 7 Credit: 17,359,964 RAC: 0 |
Issue with 50% of all Rosetta tasks on this PC.: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041 Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points. The other 50% of Rosetta tasks on this PC work as expected. All other PCs are fine. I have removed Rosetta from PC and run other projects, which work as expected. Re-added Rosetta. Tasks worked well for 4 days, then reverted back to the above failure pattern. The pc has no detected issues. As Rosetta is working well on my other PCs, the error is obviously only shown under specific conditions. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
Issue with 50% of all Rosetta tasks on this PC.: [snip] This looks like most of the points were based on the number of decoys completed, and NOT on the amount of CPU time used. You might check if this also holds for your other computers. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,527,680 RAC: 23,122 |
Issue with 50% of all Rosetta tasks on this PC.:Just had a look at those WUs on my systems, and there are some WUs that pay out considerably less Credit than others, but no where near as low as what yours are doing. And some of those with low Credit have produced many more Decoys than some of those with much higher Credit. The only difference i can see is that i've process a lot more of them- more cores & threads in use & using the default processing time. The benchmarks on that system are OK, and the system isn't losing time to doing non-crunching work, so i can't think of any particular reason for such a variation in Credit granted (although i do recall that someone had a host several months back that was exhibiting similar odd Credit payouts, but i can't remember the result of that particular issue)). The amount of Credit granted depends on the amount of work done- which is the number of Models completed. 2 WUs of the same type running on the same system running for the same length of time may complete a similar number of Models, but one may produce only 1 Decoy, the other may produce hundreds. But both should get similar amounts of Credit as they did similar amounts of work (number of Models completed), even though the number of Decoys produced is different. Processing a Task for a longer period will result in more Credit for that Task- but the Credit per hour will still be on par with processing it for a much shorter period of time. The only way to get more Credit per hour is more cores & threads, and/or higher clock speed and/or greater IPC (Instructions Per Clock). Grant Darwin NT |
sph Send message Joined: 27 Mar 20 Posts: 7 Credit: 17,359,964 RAC: 0 |
Issue with 50% of all Rosetta tasks on this PC.:Just had a look at those WUs on my systems, and there are some WUs that pay out considerably less Credit than others, but no where near as low as what yours are doing. Hi Grant +1 on all feedback. This is an older gen PC that has been able to contribute at the level expected of this generation pc. The latest optimisation of the WU seems to has introduced issues peculiar to this PC config. It is a linux virtual box VM on a windows host whereas an identical PC running Linux on the host is running fine. If Admins cannot track the issue, I may just format the host as Linux and be done with it. Some issues dont justify the time required to debug. Just hoping others may have also seen similar issues. EDITS: fix typo and format |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Issue with 50% of all Rosetta tasks on this PC.:There’s a problem somewhere that’s causing those tasks to get stuck without performing much useful work. The lines in the output like BOINC:: CPU time: 50422.3s, 36000s + 14400s[2020-10- 4 18:25: 5:] :: BOINCcome from the watchdog ending the tasks 10 hours after their target 4-hour run time. It’s odd that they validate as successful under those circumstances. That machine is running the 32-bit Rosetta application, which I suspect doesn’t get much testing these days. Perhaps there’s a bug in the application itself, or some compatibility issue with the OS environment, or even something strange going on with the virtualisation. Hard to say. |
sph Send message Joined: 27 Mar 20 Posts: 7 Credit: 17,359,964 RAC: 0 |
Issue with 50% of all Rosetta tasks on this PC.:There’s a problem somewhere that’s causing those tasks to get stuck without performing much useful work. The lines in the output like Hi Brian Saw the same message (and unusual error messages in others WUs) and the unexpected successful completion... hence my hunch on an error in the app. Didnt think of the 32 bit angle, thanks for highlighting this aspect. This may definitely be a contributing factor. Looks more and more like a format for this machine, but wont be able to schedule this for 2 - 3 weeks..... so will continue to tinker with it till then. |
sph Send message Joined: 27 Mar 20 Posts: 7 Credit: 17,359,964 RAC: 0 |
Issue with 50% of all Rosetta tasks on this PC.: Further information on this issue: If I abort these tasks after 8 hours, credit is awarded at the expected level of work completed. I can only assume the aborted tasks would result is the low credit level, but based on current trend for this pc, this is a safe assumption. The credit is not awarded immediatley, but is awarded before the task is completed by another pc. |
Ross Parlette Send message Joined: 10 Nov 05 Posts: 32 Credit: 2,165,044 RAC: 0 |
I've only had a handful of tasks for the last few days, only two today. Am I missing something? Ross |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
I've only had a handful of tasks for the last few days, only two today. Am I missing something? That's been normal for about a week. The server status page indicates that few tasks are ready to send, but many are in progress. In other words, the number of user requests for tasks greatly exceeds the number of tasks created. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org