Rosetta Beta 6.00

Message boards : Number crunching : Rosetta Beta 6.00

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 178
Credit: 5,759,842
RAC: 3,812
Message 108541 - Posted: 28 Aug 2023, 21:25:37 UTC - in response to Message 108540.  

I have been a particiapant in rosetta@home for 8 years, and only rarely do my allocated tasks fail due to computation errors. Yet lately, all but one of about 20 of my tasks on beta 6.03 app, with the 7hal prefix to the task name have led to a 'computation error' message. Sometimes within a few moments of starting, but much more frequently, many times in excess of the original estimated 'remaining time.


When I first noticed the problem I got lots of failures (all the beta tasks I received), but they were mostly short ones. As time passed, I got more and more long ones failing. Still I got all failures.

Right now I have only Rosetta 4.20 tasks that work just fine.
ID: 108541 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jeff

Send message
Joined: 24 Jan 15
Posts: 4
Credit: 1,197,538
RAC: 1,094
Message 108542 - Posted: 28 Aug 2023, 22:10:57 UTC - in response to Message 108541.  

Thanks for that, Jean-David. I'll abort any 7hal tasks that download, till I learn the problem has ben sorted.
ID: 108542 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mmonnin

Send message
Joined: 2 Jun 16
Posts: 54
Credit: 20,058,207
RAC: 230
Message 108543 - Posted: 28 Aug 2023, 23:24:38 UTC - in response to Message 108542.  
Last modified: 28 Aug 2023, 23:25:39 UTC

Thanks for that, Jean-David. I'll abort any 7hal tasks that download, till I learn the problem has ben sorted.


Just abort the ones that never checkpoint. They either fail immediately, never checkpoint/compute error or complete/ok.

Mine would checkpoint within 11-12 min if they did. Not sure if this changes based on set run time in preferences. Mine were set for 12 hours.
ID: 108543 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1887
Credit: 8,446,885
RAC: 10,938
Message 108545 - Posted: 30 Aug 2023, 19:06:42 UTC

I'm crunching, now, some "Hb_zero_test_7hal"
Maybe they decide to test a new version of these wus family (7hal)
ID: 108545 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1887
Credit: 8,446,885
RAC: 10,938
Message 108547 - Posted: 1 Sep 2023, 6:53:29 UTC - in response to Message 108545.  

I'm crunching, now, some "Hb_zero_test_7hal"
Maybe they decide to test a new version of these wus family (7hal)


All ok these wus.
ID: 108547 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 39,124,557
RAC: 23,945
Message 108558 - Posted: 6 Sep 2023, 1:48:34 UTC - in response to Message 108547.  

I'm crunching, now, some "Hb_zero_test_7hal"
Maybe they decide to test a new version of these wus family (7hal)

All ok these wus.

I didn't notice those - interesting.
No sign of corrected 7hal tasks coming through yet, but I did grab some more rb tasks today
ID: 108558 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 39,124,557
RAC: 23,945
Message 108563 - Posted: 6 Sep 2023, 12:36:35 UTC - in response to Message 108558.  

I'm crunching, now, some "Hb_zero_test_7hal"
Maybe they decide to test a new version of these wus family (7hal)

All ok these wus.

I didn't notice those - interesting.
No sign of corrected 7hal tasks coming through yet, but I did grab some more rb tasks today

I spoke 12hrs too soon.
A fair few Beta 6.03 tasks have come down this morning with the name 7mer_run_af2_hal
ID: 108563 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 39,124,557
RAC: 23,945
Message 108565 - Posted: 7 Sep 2023, 9:50:28 UTC - in response to Message 108563.  

I'm crunching, now, some "Hb_zero_test_7hal"
Maybe they decide to test a new version of these wus family (7hal)

All ok these wus.

I didn't notice those - interesting.
No sign of corrected 7hal tasks coming through yet, but I did grab some more rb tasks today

I spoke 12hrs too soon.
A fair few Beta 6.03 tasks have come down this morning with the name 7mer_run_af2_hal

24hrs later, a lot of Rosetta Beta 6.03 tasks have run, completed and credited with no errors and I'm now starting 8mer_run_af2_hal tasks as well which have started fine too
Looks like it was tasks, not the app. Good news
ID: 108565 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1887
Credit: 8,446,885
RAC: 10,938
Message 108581 - Posted: 12 Sep 2023, 8:42:44 UTC - in response to Message 108565.  

Maybe, in the future, we will see a new version of the app
ID: 108581 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 237
Credit: 352,859
RAC: 1,472
Message 108601 - Posted: 20 Sep 2023, 0:44:55 UTC - in response to Message 108311.  

0 should mean zero.
-1 should mean no limit.
ID: 108601 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 355
Credit: 382,349
RAC: 0
Message 108608 - Posted: 22 Sep 2023, 9:16:06 UTC - in response to Message 108601.  
Last modified: 22 Sep 2023, 9:22:16 UTC

0 should mean zero.
-1 should mean no limit.
I agree, either that or a dedicated tags in the app_config.xml file, one to disable the entire app and one to disable specific app version:

<app_config>
   [<app>
      <name>Application_Name</name>
      <max_concurrent>1</max_concurrent>
      [<report_results_immediately/>]
      [<fraction_done_exact/>]
      [<disable_app/>]
      <gpu_versions>
          <gpu_usage>.5</gpu_usage>
          <cpu_usage>.4</cpu_usage>
      </gpu_versions>
    </app>]
   ...
   [<app_version>
       <app_name>Application_Name</app_name>
       [<plan_class>mt</plan_class>]
       [<avg_ncpus>x</avg_ncpus>]
       [<ngpus>x</ngpus>]
       [<cmdline>--nthreads 7</cmdline>]
       [<disable_app_version/>]
   </app_version>]
   ...
   [<project_max_concurrent>N</project_max_concurrent>]
   [<report_results_immediately/>]
</app_config>

This would help also in many other cases, like for example to disable the CUDA app on not ancient Nvidia cards on Moo! or in general disable inefficient or problematic app versions without that the servers retest them every now and than.
.
ID: 108608 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1887
Credit: 8,446,885
RAC: 10,938
Message 108613 - Posted: 28 Sep 2023, 9:13:11 UTC

In the application page there is the new version, 6.04
Bugfix? New functionalities?
ID: 108613 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1887
Credit: 8,446,885
RAC: 10,938
Message 108682 - Posted: 13 Nov 2023, 15:03:30 UTC - in response to Message 108613.  

In the application page there is the new version, 6.04
Bugfix? New functionalities?


And, for Linux, a new version (6.05) at the end of October.
As usual, no info.
ID: 108682 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnDK
Avatar

Send message
Joined: 6 Apr 20
Posts: 33
Credit: 2,390,240
RAC: 23
Message 108683 - Posted: 13 Nov 2023, 20:41:17 UTC

Out of the 13 WUs I got, 11 of the had computation error within 30 secs. The other 2 have so far run for 2 hours.

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
Forkert funktion.
(0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.04_windows_x86_64.exe @07aaNewf_af2_7aa_hal_9.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1920894
Using database: database_0f7f01a1b07database

ERROR: Unable to find desired residue 'LEU:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'LEU' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT'
ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980
BOINC:: Error reading and gzipping output datafile: default.out
19:41:27 (11480): called boinc_finish(1)

</stderr_txt>
]]>
ID: 108683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 378
Credit: 11,018,841
RAC: 12,987
Message 108684 - Posted: 13 Nov 2023, 21:22:42 UTC - in response to Message 108683.  

Out of the 13 WUs I got, 11 of the had computation error within 30 secs. The other 2 have so far run for 2 hours.

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
Forkert funktion.
(0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.04_windows_x86_64.exe @07aaNewf_af2_7aa_hal_9.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1920894
Using database: database_0f7f01a1b07database

ERROR: Unable to find desired residue 'LEU:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'LEU' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT'
ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980
BOINC:: Error reading and gzipping output datafile: default.out
19:41:27 (11480): called boinc_finish(1)

</stderr_txt>
]]>


Yes, I've just come in to report the same error.

Running Ubuntu 22.04.3 LTS with Boinc 7.20.5 the tasks are Beta v6.05 with a name starting 08aaNewf_af2_8aa_dif_8

I've reset the project to pull down a new set of master files but it's a hard error.
ID: 108684 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Clint

Send message
Joined: 1 Oct 10
Posts: 4
Credit: 4,722,422
RAC: 10,664
Message 108691 - Posted: 14 Nov 2023, 15:44:59 UTC

38 of 41 jobs error out last night for many different reasons.

ERROR: Unable to find desired residue 'DVAL:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'DVAL' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT'
ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980
BOINC:: Error reading and gzipping output datafile: default.out

ERROR: Error in simple_cycpep_predict app: The N-methylation position indices must be within the pose!
ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2279
BOINC:: Error reading and gzipping output datafile: default.out

ERROR: Unable to find desired residue 'PHE:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'PHE' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT'
ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980
BOINC:: Error reading and gzipping output datafile: default.out

ERROR: Unable to find desired residue 'DLEU:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'DLEU' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT'
ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980
BOINC:: Error reading and gzipping output datafile: default.out

ERROR: Unable to find desired residue 'DPHE:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'DPHE' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT'
ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980
BOINC:: Error reading and gzipping output datafile: default.out
ID: 108691 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PMH_UK

Send message
Joined: 9 Aug 08
Posts: 12
Credit: 1,243,749
RAC: 4
Message 108692 - Posted: 14 Nov 2023, 16:57:15 UTC - in response to Message 108691.  

Had 3 re-sends, all errored with variants on "ERROR: Unable to find desired residue...", as did originals.
Paul.
ID: 108692 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1887
Credit: 8,446,885
RAC: 10,938
Message 108694 - Posted: 14 Nov 2023, 20:58:52 UTC - in response to Message 108691.  

38 of 41 jobs error out last night for many different reasons.

ERROR: Unable to find desired residue 'DVAL:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'DVAL' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT'
ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980
BOINC:: Error reading and gzipping output datafile: default.out

ERROR: Error in simple_cycpep_predict app: The N-methylation position indices must be within the pose!
ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2279
BOINC:: Error reading and gzipping output datafile: default.out

ERROR: Unable to find desired residue 'PHE:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'PHE' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT'
ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980
BOINC:: Error reading and gzipping output datafile: default.out

ERROR: Unable to find desired residue 'DLEU:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'DLEU' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT'
ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980
BOINC:: Error reading and gzipping output datafile: default.out

ERROR: Unable to find desired residue 'DPHE:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'DPHE' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT'
ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980
BOINC:: Error reading and gzipping output datafile: default.out


I have also
ERROR: Unable to find desired residue 'DALA:N_Methylation' with variant 'LOWER_TERMINUS_VARIANT'. Attempted to add target variant(s) to ResidueType using both ResidueType base name 'DALA' and base ResidueType. Was attempting to add new variant type 'LOWER_TERMINUS_VARIANT'
ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 980

ID: 108694 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PMH_UK

Send message
Joined: 9 Aug 08
Posts: 12
Credit: 1,243,749
RAC: 4
Message 108695 - Posted: 14 Nov 2023, 23:34:53 UTC - in response to Message 108692.  

Now 9 of 9 re-sends with similar failure - "ERROR: Unable to find desired residue...", as did originals.
Paul.
ID: 108695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 178
Credit: 5,759,842
RAC: 3,812
Message 108696 - Posted: 15 Nov 2023, 3:01:01 UTC - in response to Message 108682.  

I got a bunch of 36. All failed after a second or two of cpu time. One worked to completion correctly.

Here is my machine:

Computer 5910575
CPU type 	GenuineIntel
Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16
Coprocessors 	---
Operating System 	Linux Red Hat Enterprise Linux
Red Hat Enterprise Linux 8.8 (Ootpa) [4.18.0-477.27.1.el8_8.x86_64|libc 2.28]
BOINC version 	7.20.2
Memory 	128086.02 MB
Cache 	16896 KB
Swap space 	15992 MB


This is the error message of one of the failures.

Stderr output

<core_client_version>7.20.2</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-linux-gnu @08aaNewf_af2_8aa_hal_3.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1874700
Using database: database_0f7f01a1b07/database

ERROR: Error in simple_cycpep_predict app: The N-methylation position indices must be within the pose!
ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2279
BOINC:: Error reading and gzipping output datafile: default.out
18:12:37 (147984): called boinc_finish(1)

</stderr_txt>
]]>

ID: 108696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Rosetta Beta 6.00



©2024 University of Washington
https://www.bakerlab.org