Message boards : Number crunching : Minirosetta v1.47 bug thread.
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next
| Author | Message | 
|---|---|
|  robertmiles Send message Joined: 16 Jun 08 Posts: 1251 Credit: 14,421,737 RAC: 0 | 
 I'm seeing problems when attempting to show graphics on workunits with names such as cs_noe* on Mac OS X 10.4.11. Its seems like several other people are seeing similar problems. I'm seeing somewhat similar problems under Windows Vista SP1. 12/21/2008 7:18:31 AM|rosetta@home|Resuming task cs_noe_fullw_nolin_homo_bench_cs_noe_abrelax_cs_ccr19_olange_5604_39348_0 using minirosetta version 147 Moving the mouse had no particular effect, but the graphics window stayed blank and shutting it down gave some error messages before it finally worked. I normally let minirosetta run without graphics. | 
| Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 | 
 Hi all! I'm back connected with the internet. Sadly to find more errors -  we'll be back to debugging after the holidays. Quick comments for the major issues reported above: - The graphics problems cs_noe_* jobs. THis is v strange. we have NOT updated the graphics app - so these jobs must be doing something funny that the graphics app doesnt like. I'll ask the person submitting these to try and run the graphics app locally to see if we can reproduce this error. - The normal_relax_rlb[dn]_* jobs validator error. I thought i had fixed this, this must be something eles then. Yes the validator will reject the WU if it has produced more than some number of decoys (like around 128 or so per hour). Now, this is pointing to some other problem now - evidently its racing through decoys nd not doing anything with them, thereby producing thousands of results. How that can happen on a sporadic basis (< 1/1000 WUs it seems) is puzzeling me. I'll have to ook into that one. - Virus Scanners: Aehm - not really a bug. We have no control over what virus scanners seem to "recognise" about it as a malware/virus. They won't tellus either - they have been wholy unhelpful in this matter. The only solution i see right now is to set exceptions in your virus scanner to ignore apps coming from ralph.bakerlab.org and boinc.bakerlab.org Has anyone seen any new Lockfile problems ? Or are these finally a thing of the past ? Mike http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ | 
| svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 | 
 Task 215936807; Workunit 194706499; Name 1dsvA_ZNMP_ABRELAX_tetraR_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1dsvA-_5479_5614_1; crashed on Mac OS X 10.4.11 after 4 secs (thankfully) <core_client_version>6.2.18</core_client_version> < Ian_D Send message Joined: 21 Sep 05 Posts: 55 Credit: 4,216,173 RAC: 0 | 
 CPU type	GenuineIntel Intel(R) Pentium(R) 4 CPU 2.60GHz [Family 15 Model 2 Stepping 9] Number of CPUs 2 Operating System Linux 2.6.24-22-generic process exited with code 193 (0xc1, -63) Stack trace (22 frames): [0x8b979b7] [0x8bc20b0] [0xb7f03420] [0x83c53bc] [0x84356a0] [0x83c4fa3] [0x83ba6f8] [0x85c2f4e] [0x80cf524] [0x80de98f] [0x83376f7] [0x8337100] [0x8243364] [0x82a246c] [0x818e15a] [0x819bae3] [0x819b3aa] [0x8127771] [0x8129a1a] [0x804b9c8] [0x8c1dbac] [0x8048111] https://boinc.bakerlab.org/rosetta/result.php?resultid=215801702 process exited with code 193 (0xc1, -63) SIGSEGV: segmentation violation Stack trace (20 frames): [0x8b979b7] [0x8bc20b0] [0xb7fa5420] [0x83c4fa3] [0x83ba6f8] [0x85c2f4e] [0x80cf1ff] [0x80de98f] [0x83376f7] [0x8337100] [0x8243364] [0x82a246c] [0x818e15a] [0x819bae3] [0x819b3aa] [0x8127771] [0x8129a1a] [0x804b9c8] [0x8c1dbac] [0x8048111] https://boinc.bakerlab.org/rosetta/result.php?resultid=215414530 process exited with code 193 (0xc1, -63) SIGSEGV: segmentation violation Stack trace (23 frames): [0x8b979b7] [0x8bc20b0] [0xb7f48420] [0x8ace23a] [0x84348d3] [0x8ace5f6] [0x8acd739] [0x83b1c55] [0x862a631] [0x83f65af] [0x80cece6] [0x80de98f] [0x82c37e4] [0x82b897a] [0x82c16c1] [0x818d6ee] [0x819bae3] [0x819b3aa] [0x8127771] [0x8129a1a] [0x804b9c8] [0x8c1dbac] [0x8048111] https://boinc.bakerlab.org/rosetta/result.php?resultid=215035006 What's going on with the Rosetta Linux App ? Sometimes it works , sometimes it's duff ? Machine NOT overclocked in the slightest Cheers   | 
|  Greg_BE  Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 1 | 
 two more that wasted my cpu time crashing halfway edit - more of the same type of task errored out https://boinc.bakerlab.org/rosetta/result.php?resultid=215554911 t071_1_RDC_NMR_NESG_5480_119941_0 state Compute error Exit status -1073741819 (0xc0000005) CPU time 9361.141 https://boinc.bakerlab.org/rosetta/result.php?resultid=215583938 t072_1_RDC_NMR_NESG_5481_100236_0 state Compute error Exit status -1073741819 (0xc0000005) CPU time 4056.126 i am aborting the remaing t071 and t072 tasks due to 4 errors in 5-6 hours. wasting my time with that junk. another note: these 2 tasks did not respond to a suspend command in the sense that the time to completion continued to count even though the actual running time had stopped and the status showed as suspended. hope the t073 tasks are better | 
|  Greg_BE  Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 1 | 
 i think you guys should recheck the code or whatever of the t071 and t072 tasks as I see someone before me had one of these series of tasks and ran into a computer error of the same nature of what i reported. i aborted that task since i am not interested in wasting my cpu time on a compute error bugged task. | 
|  robertmiles Send message Joined: 16 Jun 08 Posts: 1251 Credit: 14,421,737 RAC: 0 | 
 I'm seeing problems when attempting to show graphics on workunits with names such as cs_noe* on Mac OS X 10.4.11. Its seems like several other people are seeing similar problems. Another workunit with graphics problems: 12/21/2008 11:27:13 AM|rosetta@home|Resuming task cs_noe_fullw_nolin_homo_bench_cs_noe_abrelax_cs_flua_olange_5605_35210_0 using minirosetta version 147 The previous one seemed to complete successfully despite the graphics problem. | 
|  robertmiles Send message Joined: 16 Jun 08 Posts: 1251 Credit: 14,421,737 RAC: 0 | 
 After a 1 week hiatus I downloaded v1.47 and 4 tasks. The first task showed a completion time of 12 hours which corresponds to my chosen runtime. The other 3 tasks, all _rlbd_ tasks, showed completion times of only 1 hour. What's up with that? It suggests that the staff provided an estimated task runtime of something like 45 minutes instead of the customary 8 hours. What effect will that have on users who have chosen default workunit times over 6 hours? Is this 6 hours per decoy or 6 hours for the whole workunit? If it only aborts one decoy, will the other decoys still continue, with credit for the decoys that completed successfully both before and after this aborted decoy? | 
| Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 | 
 What effect will that have on users who have chosen default workunit times over 6 hours? Is this 6 hours per decoy or 6 hours for the whole workunit? If it only aborts one decoy, will the other decoys still continue, with credit for the decoys that completed successfully both before and after this aborted decoy? Yes, he's talking about per model. If any models that run that long are cut off, it would help assure a more consistent runtime inline with each person's stated preference. Not perfect, but better then having some specific models haul off and run for 12 hours. So, yes, if time remains for the task, another model may begin. I won't comment on credit, because it's not my decision, and so far as I know no specific decision has been made yet. But the project has always maintained that even "failures" provide information valueable to advancing the project. At present, the model would run for (sometimes) as much as 12 hours or more, and you'd get the same credit average as those that are running models with the more average runtime under 3hrs, so if nothing else, just cutting it off at 6 hours (or whatever length is deemed appropriate) is preventing you from running for more then that, for essentially zero credit. So, this approach limits your credit loss, if nothing else. Rosetta Moderator: Mod.Sense | 
| ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0 | 
 My 1.47 cc2_1_8_mammoth-tasks have all crashed on Ralph, now my 1.47 cc2_1_8_native-tasks are crashing on Rosetta. I believe I am not allowed access to rah_queue_ops ;-) so I cannot check your observation. However, my Ralph mammoth-failures flourish, the ultimate example: cc2_1_8_mammoth_fa_cst_hb_t369__IGNORE_THE_REST_1S3QA_7_6585_1_0 When this is said I seem to have reconciled with Rosetta by rebooting the computer in question. Why this was suddenly necessary on a computer with no new program installations, no new configurations, no system upgrades, no separate computing on the side, and successfully computing 1.47-tasks 24 hours earlier, I am unable to explain. Even the subsequently installed Boinc 6.5 works like a charm. So I am loaded with tasks for a peaceful Christmas session and hope for the best until reporting time next weekend. | 
|  Greg_BE  Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 1 | 
 come on guys, you say this stuff is tested and ok and then it bombs on a windows machine. can someone tell me if this is a program error an error caused by to high of a OC speed? being that not all the tasks I get error out it would seem more of a case of a bad program and not the OC speed. see below for a series of tasks that died part of the way through. https://boinc.bakerlab.org/rosetta/result.php?resultid=215716365 cc2_1_8_native_cen_cst_hb_t311__IGNORE_THE_REST_2B5AA_7_5843_16_0 Outcome Client error Client state Compute error Exit status -1073741819 (0xc0000005) CPU time 4999.172 stderr out <core_client_version>6.4.5</core_client_version> < Greg_BE  Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 1 | 
 This makes 10 tasks in a days time that have died with the 0xc error. COME ON! This ran to within 10 minutes of completion and died. Gees! Then you insult me with me no credit granted for a 99% completed task. https://boinc.bakerlab.org/rosetta/result.php?resultid=216155882 1g47A_BOINC_MPZN_vanilla_abrelax_5901_6856_0 Workunit 196996323 Outcome Client error Client state Compute error Exit status -1073741819 (0xc0000005) CPU time 13796 stderr out <core_client_version>6.4.5</core_client_version> < Ian_D Send message Joined: 21 Sep 05 Posts: 55 Credit: 4,216,173 RAC: 0 | 
 wuid=196939593 <core_client_version>6.2.15</core_client_version> <![CDATA[ <message> process got signal 8 </message> <stderr_txt> # cpu_run_time_pref: 7200 ********************************************************************** Rosetta is going too long. Watchdog is ending the run! CPU time: 26914 seconds. Greater than 3X preferred time: 7200 seconds ********************************************************************** called boinc_finish </stderr_txt> ]]>   | 
|  Greg_BE  Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 1 | 
 your vanilla task died at 2hrs and 23 mins.  this makes about 12 failures now in 2 days. https://boinc.bakerlab.org/rosetta/result.php?resultid=216178144 1g47A_BOINC_MPZN_vanilla_abrelax_5901_7554_0 Client state Compute error Exit status -1073741819 (0xc0000005) CPU time 8912.25 stderr out <core_client_version>6.4.5</core_client_version> < 
         ©2025 University of Washington 
https://www.bakerlab.org