Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 73 · 74 · 75 · 76 · 77 · 78 · 79 . . . 300 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,527,680 RAC: 23,122 |
I've only had a handful of tasks for the last few days, only two today. Am I missing something?There hasn't been any new work available for around 5 days now*. Every so often you might be lucky enough to pickup a resend when some other system misses it's deadline & the Work Unit is re-issued. *I know of one person that did actually get allocated a new Work Unit, but it errored out as it wasn't there to be downloaded. Grant Darwin NT |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,883,897 RAC: 2,627 |
sometimes we can get somes WU like yesteday 10 the 1 Nov 2020, 21:25:35 UTC |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=1287276494 and https://boinc.bakerlab.org/rosetta/result.php?resultid=1287276019 blew up and crashed BOINC and made a mess of my system. First one "file name to long" and out of memory errors Second one Status Access Violation probably because of the first one. |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
Phantom tasks? These two tasks are supposedly "In progress", but I cannot find them in Boinc Manager. Updating the project does nothing. drhicks1_derroids_torricks_fd2_SAVE_ALL_OUT_IGNORE_THE_REST_4za6sf5o_1021338_3_0 rb_11_04_43158_42385__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_1021346_50_0 These aren't even in the BOINC project folder. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Phantom tasks?I’ve seen this happen recently, too. Don’t know what was going on. I assume they just timed out and the server resent them to a different host. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,527,680 RAC: 23,122 |
Phantom tasks? This can happen when there are network issues- Rosetta has allocated you the work during a Scheduler request, but for some reason your system didn't get that Scheduler reply, so you didn't download the work. The Task list for your Rosetta account shows you have the work, but there is no indication of it on your system. BOINC does support reissuing of missing Tasks, but it has to be enabled by the project. However due to the significant Scheduler server overhead in doing such work it is usually disabled by projects. It is possible to manually recover such Tasks, but it involves a lot of mucking around, with excellent attention to timing required. If you had hundreds of them, it'd be worth giving it a go. For just a couple, i wouldn't bother. They'll time out and be resent. Ghost Task recovery procedure. Grant Darwin NT |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
This can happen when there are network issues Welcome to my daily life. That and power issues. That explains it, I'll just not bother. It's only two tasks. Thanks. |
RandyE Send message Joined: 22 Sep 10 Posts: 4 Credit: 2,973,365 RAC: 0 |
Running multiple GPUS (3 on each) on 2 PCs. In the past, was able to run Rosetta on those PCs with no issues. Ran all 3 GPUs and 3 rosetta tasks at a time. However recently, I was getting messages from Rosetta that I needed to detach from Rosetta@Home and reattach to URL https://boinc.bakerlab.org/rosetta/. Prior version of rosetta cross-connected somehow? Hadn't run rosetta since early spring. Couldn't resolve this until I deleted all references to Rosetta in my BOINC data directory. Re-added Rosetta in BOINC Manager. Now rosetta runs multiple tasks while only 1 of my GPU task can run at a time. I see many msgs asking for control of rosetta for similar resource issues. Sad that my PCs with 6 or 8 cores can not use 3 for GPUs and the rest for rosetta? Guess Rosetta is off the list for me if this can't be resolved. At the rosetta page, I actually set the project resource percent usage "preference" down to .001. This shows in BOINC manager, but sems to do nothing? I have 1 4-core processor machine with 1 GPU that I am running rosetta on, since no problem there. Solution needed in application configuration! Tired of mucking around with preferences all over with no results! |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
Running multiple GPUS (3 on each) on 2 PCs. In the past, was able to run Rosetta on those PCs with no issues. Ran all 3 GPUs and 3 rosetta tasks at a time. I've had somewhat similar problems with Folding@home using the GPU (only one) on my computer. I determined how many CPU cores Folding@home uses, then subtracted that number from the number of virtual cores BOINC is allowed to use. One more subtracted so I can do operations on the console. Setting the preference low only lowers the percentage of CPU-only work for Rosetta@home, and only if CPU tasks from some other BOINC are available. It has no effect on GPU tasks. The https URL is due to Rosetta@home switching to a more secure method of file exchange. Some of the older versions of BOINC cannot handle this properly, so which version are you using on the computers with and without the problem? Note that rather few people run with BOINC using more than one GPU on the same computer, so it could be a problem seen only only on computers where BOINC uses more than one GPU. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,527,680 RAC: 23,122 |
I actually set the project resource percent usage "preference" down to .001.What value are you referring to there? The Resource share setting is a ratio, not a percentage. And it is a longer term setting (not a short term one) for working out the balance of work between projects. If you want the work split evenly between projects, then just leave the Resource Share value for each project at the default value of 100. Solution needed in application configuration!Solution lies in using app_config.xml to reserve a CPU core to support your GPU for the project that uses the GPU. Although looking at the processing times for the Rosetta work you have returned, the system's are only slightly over committed. Reserving 1 CPU core for 2 or even 3 GPUs should be good enough. If you chose to use some configuration settings to get the most from your GPUs then it will be necessary to reserve a CPU core/thread per GPU. From the Collatz forums <app_config> <app> <name>collatz_sieve</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>0.3</cpu_usage> </gpu_versions> </app> </app_config>change cpu_usage value to 1 for 1 CPU core/thread per GPU. BOINC Manager, Options, Read config files for it to take effect (make sure it's in the project directory). https://boinc.thesonntags.com/collatz/forum_thread.php?id=168 Tired of mucking around with preferences all over with no results!It's the mucking around with the preferences that is causing most of your issues. That, combined with somehow you created a new machine id for your FX-8320 E, so that is starting from scratch & it would have taken several days to settle down as it worked out how much Rosetta & Collatz work it needed to do over a day to meet your Resource share settings. The fact you've been changing things randomly means it will take even longer for things to settle down- in accordance which whatever new settings you have selected. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,527,680 RAC: 23,122 |
Note that rather few people run with BOINC using more than one GPU on the same computer, so it could be a problem seen only only on computers where BOINC uses more than one GPU.At Seti there were many people running multi GPU setups with no issues- as long as they reserved as many CPU cores as were needed by the application. For the stock application that was 1 CPU core per GPU Task running, particularly so if they used optimised settings. For the Linux special application a single CPU core was able to handle multiple GPUs. Grant Darwin NT |
RandyE Send message Joined: 22 Sep 10 Posts: 4 Credit: 2,973,365 RAC: 0 |
Update on this issue. The app_config.xml for Collatz project did not solve the problem. I've found that reducing the % of cpus used in BAM preferences will reduce the number of rosetta tasks that run concurrently, from 5 at 90% to 2 at 40%. Unfortunately, that does not solve the GPU problem either. All 3 GPUs run Collatz tasks with rosetta suspended. When I click resume for rosetta, Collatz processing pauses momentarily in BAM display, then 2 of my GPUs go into wait mode and multiple rosetta tasks start.. I also set resource preference on rosetta page back to 100, since trying to limit rosetta resources that way doesn't work. I suppose I might find a clue in the backup files of one of my PCs before BAM and BOINC were reinstalled recently. Last week one PC was actually running 3 collatz GPU tasks and 3 rosetta tascks at the same time before I cleaned out old files to clear the error for 2 connections to rosetta. But when the rosetta tasks completed, rosetta went into pause and BAM told me the pause was at my request, though it wasn't? So clearly something was not right. Gonna keep trying to resolve this somehow. Thanks to the folks that gave me feedback on this. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,527,680 RAC: 23,122 |
What messages do you get in the Event log when Tasks start running, and others suspending? There has recently been a batch of Rosetta Tasks where a single Task can use as much as 5GB of RAM. With your limited system RAM, one of those Tasks with a couple of normal RAM requirement Tasks would result in other Tasks pausing due to a lack of RAM. Once completed, then the other Tasks would start back up again. Once all the large RAM Tasks are done, then things should run as they did previously (if all changes have been reverted to their original settings). Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,527,680 RAC: 23,122 |
I wonder what happened with these two Tasks? Both marked as Invalid. dt_201104_hallucinated_C3D_01_122C3D_01_122_r2_127_model_fd_chA_fragments_abinitio_SAVE_ALL_OUT_1020463_1271_0 <core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -beta -frag3 00001.200.3mers -frag9 00001.200.9mers -abinitio::increase_cycles 10 -mute all -abinitio::fastrelax -relax::default_repeats 5 -abinitio::rsd_wt_helix 0.5 -abinitio::rsd_wt_loop 0.5 -abinitio::use_filters false -ex1 -ex2aro -in:file:boinc_wu_zip dt_201104_hallucinated_C3D_01_122C3D_01_122_r2_127_model_fd_chA_fragments_fold_data.zip -abinitio::rg_reweight 0.5 -out:file:silent default.out -silent_gz -mute all -in:file:native 00001.pdb -out:file:silent_struct_type binary -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1411607 Using database: database_357d5d93529_n_methylminirosetta_database ====================================================== DONE :: 1 starting structures 28802 cpu seconds This process generated 98 decoys from 98 attempts ====================================================== BOINC :: WS_max 4.74567e+08 09:56:25 (1888): called boinc_finish(0) </stderr_txt> ]]> dt_201104_hallucinated_C3D_01_129C3D_01_129_r2_36_model_fd_chA_fragments_abinitio_SAVE_ALL_OUT_1021102_1270_0 <core_client_version>7.6.33</core_client_version> <![CDATA[ <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -beta -frag3 00001.200.3mers -frag9 00001.200.9mers -abinitio::increase_cycles 10 -mute all -abinitio::fastrelax -relax::default_repeats 5 -abinitio::rsd_wt_helix 0.5 -abinitio::rsd_wt_loop 0.5 -abinitio::use_filters false -ex1 -ex2aro -in:file:boinc_wu_zip dt_201104_hallucinated_C3D_01_129C3D_01_129_r2_36_model_fd_chA_fragments_fold_data.zip -abinitio::rg_reweight 0.5 -out:file:silent default.out -silent_gz -mute all -in:file:native 00001.pdb -out:file:silent_struct_type binary -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3454571 Using database: database_357d5d93529_n_methylminirosetta_database ====================================================== DONE :: 1 starting structures 28653.6 cpu seconds This process generated 95 decoys from 95 attempts ====================================================== BOINC :: WS_max 4.65773e+08 09:14:35 (7324): called boinc_finish(0) </stderr_txt> ]]> Grant Darwin NT |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
Update on this issue. The app_config.xml for Collatz project did not solve the problem. I've found that reducing the % of cpus used in BAM preferences will reduce the number of rosetta tasks that run concurrently, from 5 at 90% to 2 at 40%. Unfortunately, that does not solve the GPU problem either. All 3 GPUs run Collatz tasks with rosetta suspended. When I click resume for rosetta, Collatz processing pauses momentarily in BAM display, then 2 of my GPUs go into wait mode and multiple rosetta tasks start.. I also set resource preference on rosetta page back to 100, since trying to limit rosetta resources that way doesn't work. I suppose I might find a clue in the backup files of one of my PCs before BAM and BOINC were reinstalled recently. Last week one PC was actually running 3 collatz GPU tasks and 3 rosetta tascks at the same time before I cleaned out old files to clear the error for 2 connections to rosetta. But when the rosetta tasks completed, rosetta went into pause and BAM told me the pause was at my request, though it wasn't? So clearly something was not right. Gonna keep trying to resolve this somehow. Thanks to the folks that gave me feedback on this. It may be time for you to add Ralph@home to one of your computers with 3 GPUs each to help debug this problem. |
RandyE Send message Joined: 22 Sep 10 Posts: 4 Credit: 2,973,365 RAC: 0 |
Eureka! There's gold in the BOINC Client config parameter doc page. Problem solved! Again, thanks to all the folks who assisted me in digging this out! Got it! Following info. Shotgun approach that works. All parameters are documented at boinc site. All 3 GPUs run collatz while 3 rosetta tasks run! This works on my 6 core I7 and 8 core AMD, both eunning Win 10. URL for boinc configuration parameters: https://boinc.berkeley.edu/wiki/Client_configuration ** cc_config.xml in C:windowsprogramdataboinc Use all GPUs for collatz. Exclude NVIDIA GPUs for rosetta. I run NVIDIA only. Also arguments for other GPU types and GPU by number like 2 for second one? cc_config.xml ------------------------------------ <cc_config> <options> <use_all_gpus>1</use_all_gpus> <skip_cpu_benchmarks>1</skip_cpu_benchmarks> <exclude_gpu> <url>https://boinc.bakerlab.org/rosetta/</url> <type>NVIDIA</type> </exclude_gpu> </options> </cc_config> ** This app_config in C:windowsprogramdataboincproject...collatz... directory. Says to use a GPU for each collatz_sieve task. I personally don't want to try fractional values for <gpu_usage>, though you may be able too with 6GB or 8GB GPUs. Supposed to allow multiple tasks per card that way? app_ config.xml in collatz project directory --------------------- <app_config> <app> <name>collatz_sieve</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>1.<span class="mark">3</span></cpu_usage> </gpu_versions> </app> </app_config> ** This app_config in C:windowsprogramdataboincproject...rosetta... directory. Says limit rosetta to N tasks, 3 here. app_ config.xml in rosetta project directory --------------------- <app_config> <app> <name>rosetta</name> </app> <project_max_concurrent>3</project_max_concurrent> </app_config> Crunch those numbers! |
RandyE Send message Joined: 22 Sep 10 Posts: 4 Credit: 2,973,365 RAC: 0 |
Hmmm, guess the editor here don't like the "" character in pathname strings. Oh well, we can figure it out, right? |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
It seems to be a bug in the BOINC forum software, as backslashes in user input get interpreted as escape characters instead of themselves being escaped… … multiple times… … which means if you type enough in the input to overcome that, you’ll get one in the output! Four backslashes in ⟶ one backslash out C:\Program Files\BOINC |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,527,680 RAC: 23,122 |
** This app_config in C:windowsprogramdataboincproject...rosetta... directory. Says limit rosetta to N tasks, 3 here.Not necessary if you reserve a CPU core to support the GPU. If you run out of GPU work, the CPU cores will pick up CPU work. If you get more GPU work, then those cores will go back to supporting the GPU. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,527,680 RAC: 23,122 |
Ah, we're back. Project was MIA for a while- no web site & uploads backing up. Edit- although still no luck with Scheduler responses, says it's down for maintenance. Grant Darwin NT |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org