Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 73 · 74 · 75 · 76 · 77 · 78 · 79 . . . 300 · Next

AuthorMessage
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1671
Credit: 17,527,680
RAC: 23,122
Message 99499 - Posted: 2 Nov 2020, 9:13:16 UTC - in response to Message 99484.  

I've only had a handful of tasks for the last few days, only two today. Am I missing something?
There hasn't been any new work available for around 5 days now*.
Every so often you might be lucky enough to pickup a resend when some other system misses it's deadline & the Work Unit is re-issued.




*I know of one person that did actually get allocated a new Work Unit, but it errored out as it wasn't there to be downloaded.
Grant
Darwin NT
ID: 99499 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,883,897
RAC: 2,627
Message 99500 - Posted: 2 Nov 2020, 10:31:53 UTC

sometimes we can get somes WU like yesteday 10 the 1 Nov 2020, 21:25:35 UTC
ID: 99500 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 99547 - Posted: 3 Nov 2020, 19:37:53 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=1287276494

and

https://boinc.bakerlab.org/rosetta/result.php?resultid=1287276019

blew up and crashed BOINC and made a mess of my system.

First one "file name to long" and out of memory errors
Second one Status Access Violation probably because of the first one.
ID: 99547 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 99596 - Posted: 7 Nov 2020, 0:29:17 UTC
Last modified: 7 Nov 2020, 0:29:37 UTC

Phantom tasks?
These two tasks are supposedly "In progress", but I cannot find them in Boinc Manager. Updating the project does nothing.
drhicks1_derroids_torricks_fd2_SAVE_ALL_OUT_IGNORE_THE_REST_4za6sf5o_1021338_3_0
rb_11_04_43158_42385__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_1021346_50_0
These aren't even in the BOINC project folder.
ID: 99596 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 99597 - Posted: 7 Nov 2020, 1:10:48 UTC - in response to Message 99596.  

Phantom tasks?
I’ve seen this happen recently, too. Don’t know what was going on. I assume they just timed out and the server resent them to a different host.
ID: 99597 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1671
Credit: 17,527,680
RAC: 23,122
Message 99598 - Posted: 7 Nov 2020, 1:40:47 UTC - in response to Message 99596.  
Last modified: 7 Nov 2020, 1:42:53 UTC

Phantom tasks?
These two tasks are supposedly "In progress", but I cannot find them in Boinc Manager. Updating the project does nothing.
drhicks1_derroids_torricks_fd2_SAVE_ALL_OUT_IGNORE_THE_REST_4za6sf5o_1021338_3_0
rb_11_04_43158_42385__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_1021346_50_0
These aren't even in the BOINC project folder.

This can happen when there are network issues- Rosetta has allocated you the work during a Scheduler request, but for some reason your system didn't get that Scheduler reply, so you didn't download the work. The Task list for your Rosetta account shows you have the work, but there is no indication of it on your system.

BOINC does support reissuing of missing Tasks, but it has to be enabled by the project. However due to the significant Scheduler server overhead in doing such work it is usually disabled by projects.


It is possible to manually recover such Tasks, but it involves a lot of mucking around, with excellent attention to timing required. If you had hundreds of them, it'd be worth giving it a go. For just a couple, i wouldn't bother. They'll time out and be resent.

Ghost Task recovery procedure.
Grant
Darwin NT
ID: 99598 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 99599 - Posted: 7 Nov 2020, 1:53:10 UTC - in response to Message 99598.  
Last modified: 7 Nov 2020, 1:56:41 UTC

This can happen when there are network issues

Welcome to my daily life. That and power issues.
That explains it, I'll just not bother. It's only two tasks.
Thanks.
ID: 99599 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RandyE

Send message
Joined: 22 Sep 10
Posts: 4
Credit: 2,973,365
RAC: 0
Message 99648 - Posted: 12 Nov 2020, 1:05:50 UTC
Last modified: 12 Nov 2020, 1:07:30 UTC

Running multiple GPUS (3 on each) on 2 PCs. In the past, was able to run Rosetta on those PCs with no issues. Ran all 3 GPUs and 3 rosetta tasks at a time.
However recently, I was getting messages from Rosetta that I needed to detach from Rosetta@Home and reattach to URL https://boinc.bakerlab.org/rosetta/. Prior version of rosetta cross-connected somehow? Hadn't run rosetta since early spring. Couldn't resolve this until I deleted all references to Rosetta in my BOINC data directory. Re-added Rosetta in BOINC Manager. Now rosetta runs multiple tasks while only 1 of my GPU task can run at a time.
I see many msgs asking for control of rosetta for similar resource issues. Sad that my PCs with 6 or 8 cores can not use 3 for GPUs and the rest for rosetta? Guess Rosetta is off the list for me if this can't be resolved. At the rosetta page, I actually set the project resource percent usage "preference" down to .001. This shows in BOINC manager, but sems to do nothing? I have 1 4-core processor machine with 1 GPU that I am running rosetta on, since no problem there.
Solution needed in application configuration! Tired of mucking around with preferences all over with no results!
ID: 99648 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,846
Message 99649 - Posted: 12 Nov 2020, 2:21:31 UTC - in response to Message 99648.  
Last modified: 12 Nov 2020, 2:25:05 UTC

Running multiple GPUS (3 on each) on 2 PCs. In the past, was able to run Rosetta on those PCs with no issues. Ran all 3 GPUs and 3 rosetta tasks at a time.
However recently, I was getting messages from Rosetta that I needed to detach from Rosetta@Home and reattach to URL https://boinc.bakerlab.org/rosetta/. Prior version of rosetta cross-connected somehow? Hadn't run rosetta since early spring. Couldn't resolve this until I deleted all references to Rosetta in my BOINC data directory. Re-added Rosetta in BOINC Manager. Now rosetta runs multiple tasks while only 1 of my GPU task can run at a time.
I see many msgs asking for control of rosetta for similar resource issues. Sad that my PCs with 6 or 8 cores can not use 3 for GPUs and the rest for rosetta? Guess Rosetta is off the list for me if this can't be resolved. At the rosetta page, I actually set the project resource percent usage "preference" down to .001. This shows in BOINC manager, but sems to do nothing? I have 1 4-core processor machine with 1 GPU that I am running rosetta on, since no problem there.
Solution needed in application configuration! Tired of mucking around with preferences all over with no results!

I've had somewhat similar problems with Folding@home using the GPU (only one) on my computer. I determined how many CPU cores Folding@home uses, then subtracted that number from the number of virtual cores BOINC is allowed to use. One more subtracted so I can do operations on the console.

Setting the preference low only lowers the percentage of CPU-only work for Rosetta@home, and only if CPU tasks from some other BOINC are available. It has no effect on GPU tasks.

The https URL is due to Rosetta@home switching to a more secure method of file exchange. Some of the older versions of BOINC cannot handle this properly, so which version are you using on the computers with and without the problem?

Note that rather few people run with BOINC using more than one GPU on the same computer, so it could be a problem seen only only on computers where BOINC uses more than one GPU.
ID: 99649 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1671
Credit: 17,527,680
RAC: 23,122
Message 99650 - Posted: 12 Nov 2020, 8:55:45 UTC - in response to Message 99648.  
Last modified: 12 Nov 2020, 8:57:07 UTC

I actually set the project resource percent usage "preference" down to .001.
What value are you referring to there?
The Resource share setting is a ratio, not a percentage. And it is a longer term setting (not a short term one) for working out the balance of work between projects.

If you want the work split evenly between projects, then just leave the Resource Share value for each project at the default value of 100.



Solution needed in application configuration!
Solution lies in using app_config.xml to reserve a CPU core to support your GPU for the project that uses the GPU. Although looking at the processing times for the Rosetta work you have returned, the system's are only slightly over committed.
Reserving 1 CPU core for 2 or even 3 GPUs should be good enough. If you chose to use some configuration settings to get the most from your GPUs then it will be necessary to reserve a CPU core/thread per GPU.

From the Collatz forums
<app_config>
   <app>
      <name>collatz_sieve</name>
      <gpu_versions>
      <gpu_usage>1.0</gpu_usage>
      <cpu_usage>0.3</cpu_usage>
      </gpu_versions>
   </app>
</app_config>
change cpu_usage value to 1 for 1 CPU core/thread per GPU.
BOINC Manager, Options, Read config files for it to take effect (make sure it's in the project directory).
https://boinc.thesonntags.com/collatz/forum_thread.php?id=168



Tired of mucking around with preferences all over with no results!
It's the mucking around with the preferences that is causing most of your issues. That, combined with somehow you created a new machine id for your FX-8320 E, so that is starting from scratch & it would have taken several days to settle down as it worked out how much Rosetta & Collatz work it needed to do over a day to meet your Resource share settings. The fact you've been changing things randomly means it will take even longer for things to settle down- in accordance which whatever new settings you have selected.
Grant
Darwin NT
ID: 99650 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1671
Credit: 17,527,680
RAC: 23,122
Message 99651 - Posted: 12 Nov 2020, 9:03:47 UTC - in response to Message 99649.  

Note that rather few people run with BOINC using more than one GPU on the same computer, so it could be a problem seen only only on computers where BOINC uses more than one GPU.
At Seti there were many people running multi GPU setups with no issues- as long as they reserved as many CPU cores as were needed by the application. For the stock application that was 1 CPU core per GPU Task running, particularly so if they used optimised settings. For the Linux special application a single CPU core was able to handle multiple GPUs.
Grant
Darwin NT
ID: 99651 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RandyE

Send message
Joined: 22 Sep 10
Posts: 4
Credit: 2,973,365
RAC: 0
Message 99665 - Posted: 13 Nov 2020, 7:16:15 UTC - in response to Message 99648.  

Update on this issue. The app_config.xml for Collatz project did not solve the problem. I've found that reducing the % of cpus used in BAM preferences will reduce the number of rosetta tasks that run concurrently, from 5 at 90% to 2 at 40%. Unfortunately, that does not solve the GPU problem either. All 3 GPUs run Collatz tasks with rosetta suspended. When I click resume for rosetta, Collatz processing pauses momentarily in BAM display, then 2 of my GPUs go into wait mode and multiple rosetta tasks start.. I also set resource preference on rosetta page back to 100, since trying to limit rosetta resources that way doesn't work. I suppose I might find a clue in the backup files of one of my PCs before BAM and BOINC were reinstalled recently. Last week one PC was actually running 3 collatz GPU tasks and 3 rosetta tascks at the same time before I cleaned out old files to clear the error for 2 connections to rosetta. But when the rosetta tasks completed, rosetta went into pause and BAM told me the pause was at my request, though it wasn't? So clearly something was not right. Gonna keep trying to resolve this somehow. Thanks to the folks that gave me feedback on this.
ID: 99665 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1671
Credit: 17,527,680
RAC: 23,122
Message 99666 - Posted: 13 Nov 2020, 9:54:43 UTC

What messages do you get in the Event log when Tasks start running, and others suspending?
There has recently been a batch of Rosetta Tasks where a single Task can use as much as 5GB of RAM. With your limited system RAM, one of those Tasks with a couple of normal RAM requirement Tasks would result in other Tasks pausing due to a lack of RAM.
Once completed, then the other Tasks would start back up again. Once all the large RAM Tasks are done, then things should run as they did previously (if all changes have been reverted to their original settings).
Grant
Darwin NT
ID: 99666 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1671
Credit: 17,527,680
RAC: 23,122
Message 99668 - Posted: 13 Nov 2020, 10:21:39 UTC

I wonder what happened with these two Tasks? Both marked as Invalid.


dt_201104_hallucinated_C3D_01_122C3D_01_122_r2_127_model_fd_chA_fragments_abinitio_SAVE_ALL_OUT_1020463_1271_0

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -beta -frag3 00001.200.3mers -frag9 00001.200.9mers -abinitio::increase_cycles 10 -mute all -abinitio::fastrelax -relax::default_repeats 5 -abinitio::rsd_wt_helix 0.5 -abinitio::rsd_wt_loop 0.5 -abinitio::use_filters false -ex1 -ex2aro -in:file:boinc_wu_zip dt_201104_hallucinated_C3D_01_122C3D_01_122_r2_127_model_fd_chA_fragments_fold_data.zip -abinitio::rg_reweight 0.5 -out:file:silent default.out -silent_gz -mute all -in:file:native 00001.pdb -out:file:silent_struct_type binary -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1411607
Using database: database_357d5d93529_n_methylminirosetta_database
======================================================
DONE ::     1 starting structures    28802 cpu seconds
This process generated     98 decoys from      98 attempts
======================================================
BOINC :: WS_max 4.74567e+08
09:56:25 (1888): called boinc_finish(0)

</stderr_txt>
]]>


dt_201104_hallucinated_C3D_01_129C3D_01_129_r2_36_model_fd_chA_fragments_abinitio_SAVE_ALL_OUT_1021102_1270_0

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -beta -frag3 00001.200.3mers -frag9 00001.200.9mers -abinitio::increase_cycles 10 -mute all -abinitio::fastrelax -relax::default_repeats 5 -abinitio::rsd_wt_helix 0.5 -abinitio::rsd_wt_loop 0.5 -abinitio::use_filters false -ex1 -ex2aro -in:file:boinc_wu_zip dt_201104_hallucinated_C3D_01_129C3D_01_129_r2_36_model_fd_chA_fragments_fold_data.zip -abinitio::rg_reweight 0.5 -out:file:silent default.out -silent_gz -mute all -in:file:native 00001.pdb -out:file:silent_struct_type binary -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3454571
Using database: database_357d5d93529_n_methylminirosetta_database
======================================================
DONE ::     1 starting structures  28653.6 cpu seconds
This process generated     95 decoys from      95 attempts
======================================================
BOINC :: WS_max 4.65773e+08
09:14:35 (7324): called boinc_finish(0)

</stderr_txt>
]]>

Grant
Darwin NT
ID: 99668 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,846
Message 99673 - Posted: 13 Nov 2020, 22:20:45 UTC - in response to Message 99665.  

Update on this issue. The app_config.xml for Collatz project did not solve the problem. I've found that reducing the % of cpus used in BAM preferences will reduce the number of rosetta tasks that run concurrently, from 5 at 90% to 2 at 40%. Unfortunately, that does not solve the GPU problem either. All 3 GPUs run Collatz tasks with rosetta suspended. When I click resume for rosetta, Collatz processing pauses momentarily in BAM display, then 2 of my GPUs go into wait mode and multiple rosetta tasks start.. I also set resource preference on rosetta page back to 100, since trying to limit rosetta resources that way doesn't work. I suppose I might find a clue in the backup files of one of my PCs before BAM and BOINC were reinstalled recently. Last week one PC was actually running 3 collatz GPU tasks and 3 rosetta tascks at the same time before I cleaned out old files to clear the error for 2 connections to rosetta. But when the rosetta tasks completed, rosetta went into pause and BAM told me the pause was at my request, though it wasn't? So clearly something was not right. Gonna keep trying to resolve this somehow. Thanks to the folks that gave me feedback on this.

It may be time for you to add Ralph@home to one of your computers with 3 GPUs each to help debug this problem.
ID: 99673 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RandyE

Send message
Joined: 22 Sep 10
Posts: 4
Credit: 2,973,365
RAC: 0
Message 99674 - Posted: 14 Nov 2020, 0:29:44 UTC - in response to Message 99648.  

Eureka! There's gold in the BOINC Client config parameter doc page. Problem solved! Again, thanks to all the folks who assisted me in digging this out!

Got it! Following info. Shotgun approach that works. All parameters are documented at boinc site.
All 3 GPUs run collatz while 3 rosetta tasks run! This works on my 6 core I7 and 8 core AMD, both eunning Win 10.

URL for boinc configuration parameters: https://boinc.berkeley.edu/wiki/Client_configuration

** cc_config.xml in C:windowsprogramdataboinc Use all GPUs for collatz. Exclude NVIDIA GPUs for rosetta. I run NVIDIA only. Also arguments for other GPU types and GPU by number like 2 for second one?

cc_config.xml ------------------------------------

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
<skip_cpu_benchmarks>1</skip_cpu_benchmarks>
<exclude_gpu>
<url>https://boinc.bakerlab.org/rosetta/</url>
<type>NVIDIA</type>
</exclude_gpu>
</options>
</cc_config>


** This app_config in C:windowsprogramdataboincproject...collatz... directory. Says to use a GPU for each collatz_sieve task.
I personally don't want to try fractional values for <gpu_usage>, though you may be able too with 6GB or 8GB GPUs.
Supposed to allow multiple tasks per card that way?

app_ config.xml in collatz project directory ---------------------

<app_config>
<app>
<name>collatz_sieve</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>1.<span class="mark">3</span></cpu_usage>
</gpu_versions>
</app>
</app_config>


** This app_config in C:windowsprogramdataboincproject...rosetta... directory. Says limit rosetta to N tasks, 3 here.

app_ config.xml in rosetta project directory ---------------------

<app_config>
<app>
<name>rosetta</name>
</app>
<project_max_concurrent>3</project_max_concurrent>
</app_config>

Crunch those numbers!
ID: 99674 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RandyE

Send message
Joined: 22 Sep 10
Posts: 4
Credit: 2,973,365
RAC: 0
Message 99675 - Posted: 14 Nov 2020, 0:32:55 UTC - in response to Message 99674.  

Hmmm, guess the editor here don't like the "" character in pathname strings. Oh well, we can figure it out, right?
ID: 99675 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 99676 - Posted: 14 Nov 2020, 1:56:24 UTC - in response to Message 99675.  

It seems to be a bug in the BOINC forum software, as backslashes in user input get interpreted as escape characters instead of themselves being escaped…

… multiple times…

… which means if you type enough in the input to overcome that, you’ll get one in the output!

Four backslashes in ⟶ one backslash out

    C:\Program Files\BOINC
ID: 99676 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1671
Credit: 17,527,680
RAC: 23,122
Message 99677 - Posted: 14 Nov 2020, 2:36:23 UTC - in response to Message 99674.  

** This app_config in C:windowsprogramdataboincproject...rosetta... directory. Says limit rosetta to N tasks, 3 here.
Not necessary if you reserve a CPU core to support the GPU.
If you run out of GPU work, the CPU cores will pick up CPU work. If you get more GPU work, then those cores will go back to supporting the GPU.
Grant
Darwin NT
ID: 99677 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1671
Credit: 17,527,680
RAC: 23,122
Message 99685 - Posted: 15 Nov 2020, 4:51:46 UTC
Last modified: 15 Nov 2020, 4:53:16 UTC

Ah, we're back. Project was MIA for a while- no web site & uploads backing up.

Edit- although still no luck with Scheduler responses, says it's down for maintenance.
Grant
Darwin NT
ID: 99685 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 73 · 74 · 75 · 76 · 77 · 78 · 79 . . . 300 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org