Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 179 · 180 · 181 · 182 · 183 · 184 · 185 . . . 299 · Next
Author | Message |
---|---|
robertmiles Send message Joined: 16 Jun 08 Posts: 1231 Credit: 14,243,141 RAC: 3,647 |
Set cache setting to 10 days and try and trash as many of them as possible :-), until it gets backoffdNot that many (compared to how many there are to get through). For at least some BOINC projects, if every computer that tries a workunit gives computation error, BOINC decides that the workunit is defective and no longer counts any of the failures against the computers that tried it. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,985,460 RAC: 19,620 |
I only run on one Linux and one Windows 7 atm and all the Windows error out within 20 seconds. All the Ubuntu ones were good until now. The 14 movingstubs units ran the requested 8 hours on my 3900X. I can see that a new team member successfully completed his on Linux, but all were 3 hours. W3670. CPU and OS related? I've reported this too. Your W7, my W10 and someone else's W11 all fail while Ubuntu and Linux all run ok. Hopefully it means something to them. Tasks haven't been withdrawn yet (that I've noticed) and I've had no direct reply yet either. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,985,460 RAC: 19,620 |
After getting all Computation Errors on all the WU's I set my PC to No New Tasks until you folks can figure out why Windows crashes with the "movingstub" Work units I'm not 100% on the ball at the moment, Greg, but I am reporting things within 24 hours of seeing a post about them here. Specifically I mean the movingstub tasks computation error, and now the update that they're running ok on UbuntuLinux but not any version of windows. I haven't got my head around what the specific issue is with rb tasks crashing out, but I do see people reporting problems with those too. If someone describes the rb issue to me in a way I can pass on, I'll follow up with that too. Is it much the same as movingstubs tasks or are they crashing on Ubuntu as well? Edit Also in python the aagb-PHE stuff is buggy Just spotted your mention of this too. I'm away from my one PC running Python tasks for another few days, so I'll try to confirm that issue as well |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Well its time for bed so I have undone my movingtargets silliness Got 500 of them in the bin. Though I did have a long running one , it lasted a full 30 seconds . |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1664 Credit: 17,386,207 RAC: 24,321 |
Well, the project was down for a while there, but the movingstub crash and burns are still there since it came back online. Grant Darwin NT |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
SO! MANY! ERRORS! First off are the known bad movingstub tasks that crash almost instantly. Then there are the broken rb_ tasks that don't crash as fast. rb_02_16_213031_208962_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_06_04_2906559_35_1 <core_client_version>7.16.20</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_02_16_213031_208962_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 3 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_02_16_213031_208962_ab_t000__robetta.zip -frag3 rb_02_16_213031_208962_ab_t000__robetta.200.3mers.index.gz -fragA rb_02_16_213031_208962_ab_t000__robetta.200.4mers.index.gz -fragB rb_02_16_213031_208962_ab_t000__robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3477109 Using database: database_357d5d93529_n_methylminirosetta_database [ ERROR ]: Caught exception: File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: -nan(ind) ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. </stderr_txt> ]]> rb_02_16_213027_208958_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_06_08_2906555_55_1 <core_client_version>7.16.20</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_02_16_213027_208958_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 3 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_02_16_213027_208958_ab_t000__robetta.zip -frag3 rb_02_16_213027_208958_ab_t000__robetta.200.3mers.index.gz -fragA rb_02_16_213027_208958_ab_t000__robetta.200.8mers.index.gz -fragB rb_02_16_213027_208958_ab_t000__robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3485351 Using database: database_357d5d93529_n_methylminirosetta_database [ ERROR ]: Caught exception: File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: -nan(ind) ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. </stderr_txt> ]]> rb_02_16_213026_208957_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_06_09_2906558_22_1 <core_client_version>7.16.20</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_02_16_213026_208957_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 3 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_02_16_213026_208957_ab_t000__robetta.zip -frag3 rb_02_16_213026_208957_ab_t000__robetta.200.3mers.index.gz -fragA rb_02_16_213026_208957_ab_t000__robetta.200.9mers.index.gz -fragB rb_02_16_213026_208957_ab_t000__robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3477932 Using database: database_357d5d93529_n_methylminirosetta_database [ ERROR ]: Caught exception: File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: -nan(ind) ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. </stderr_txt> ]]> Windows 11 |
6dj72cn8 Send message Joined: 18 Apr 06 Posts: 5 Credit: 207,684 RAC: 0 |
So it's going to take a while to clear them unless the project . . . puts in a flag with the Scheduler to only allocate them to LINUX systems. Linux or Mac. It's only Windows that's choking on them. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1664 Credit: 17,386,207 RAC: 24,321 |
And Mac OS is LINUX by a different name, or if it's M1 it's Android, by a different name.So it's going to take a while to clear them unless the project . . . puts in a flag with the Scheduler to only allocate them to LINUX systems. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1664 Credit: 17,386,207 RAC: 24,321 |
SO! MANY! ERRORS!And in the past when Tasks have crashed out with this error chi angle must be between -180 and 180: -nan(ind)you've still gotten Credit for the work done. For some reason, that's not happening with these. Grant Darwin NT |
robertmiles Send message Joined: 16 Jun 08 Posts: 1231 Credit: 14,243,141 RAC: 3,647 |
SO! MANY! ERRORS!And in the past when Tasks have crashed out with this error That probably depends on whether the verifier program on the server recognizes the error message as enough evidence to declare the workunit faulty. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1664 Credit: 17,386,207 RAC: 24,321 |
Hmm.SO! MANY! ERRORS!And in the past when Tasks have crashed out with this error Or in the past the reported Task has included a Result file along with the Stderr output, but in these cases it hasn't? Either way, these errors have been occurring for ages now, the applications should have been updated to handle the error and to just treat it as the Task finishing early, and not as an error after it has already done all that work. Grant Darwin NT |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
Welp, 4 hours later, this one also crashed: rb_02_16_213027_208958_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_06_09_2906555_51_1 <core_client_version>7.16.20</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_02_16_213027_208958_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 3 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_02_16_213027_208958_ab_t000__robetta.zip -frag3 rb_02_16_213027_208958_ab_t000__robetta.200.3mers.index.gz -fragA rb_02_16_213027_208958_ab_t000__robetta.200.9mers.index.gz -fragB rb_02_16_213027_208958_ab_t000__robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3484950 Using database: database_357d5d93529_n_methylminirosetta_database [ ERROR ]: Caught exception: File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: -nan(ind) ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. </stderr_txt> ]]> Same issue with all the other borked rb_ tasks. I sure hope they catch this and fix it soon. Wait, it's the weekend... This is fine, I'm fine, there's nothing infuriating about this at all... Nothing at all. |
6dj72cn8 Send message Joined: 18 Apr 06 Posts: 5 Credit: 207,684 RAC: 0 |
And Mac OS is LINUX by a different name, or if it's M1 it's Android, by a different name. Yes, I know. A UNIX kernel. I bothered to mention it because I din't want the Admins to limit the work units to specifically a LINUX OS after they had read this board. [That was a joke, in case you also feel the need to explain to me that they don't read here.] |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1664 Credit: 17,386,207 RAC: 24,321 |
OK, this LINUX system has popped out plenty of errors with those problem RB Tasks, and they have gotten credit. And here is a Windows system erroring out the same RB Tasks, and getting Credit for them as well. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2109 Credit: 40,985,460 RAC: 19,620 |
I haven't got my head around what the specific issue is with rb tasks crashing out, but I do see people reporting problems with those too. I've finally found some RB tasks on this PC and, while they haven't finished, they're at a few hours in without crashing yet. The problem ones you guys are posting are rb_02_14 or rb_02_16 and the ones I have are rb_02_18 If the project was shut down for a while and came back up shortly after, could it be a fix went in? I notice a half million drop in the queued jobs on the front page. Or I'm speaking too soon. I'll see in 5hrs more time |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 1 |
24 movingstubs were in my queue. Aborted them instantly.Aborted Tasks count as errors. Yeah I saw that, but I didn't waste any compute time on them. Mental thing..or more of a blank blank thing really. I just saw the updated comments, so I'll let those tasks back in. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1989 Credit: 9,460,369 RAC: 12,264 |
Still "movingstub" wus download today. Yesterday I tried to contact Rosetta@Home, RosettaCommons and IPD twitter accounts. Today answer from Rosett@Home account: We'll look into this. Thanks for flagging! Fingers crossed!! |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Still "movingstub" wus download today. Thanks for alerting them to it. But on what other project are they not monitoring their own work units for failures? And on what other project are they not monitoring the forums for problems? I sometimes wonder if they are serious about this at all. |
computezrmle Send message Joined: 9 Dec 11 Posts: 63 Credit: 9,680,103 RAC: 0 |
I sometimes wonder if they are serious about this at all. +1 Even correcting errors without posting a short official comment like "issue xyz is now fixed" is a very impolite behaviour. |
mrchips Send message Joined: 11 Nov 09 Posts: 10 Credit: 14,456,453 RAC: 27,043 |
Name movingstub_gzm1_minimize_3CL_AVLstub_0194_21_extract_B_SAVE_ALL_OUT_2908392_402_0 Outcome Computation error Have been getting these errors for 3 days, no tasks finish, then I get this computer has finished a daily quota of 1 tasks |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org