Rosetta 100% computation errors within 20 secs since Apr/06/21

Questions and Answers : Windows : Rosetta 100% computation errors within 20 secs since Apr/06/21

To post messages, you must log in.

AuthorMessage
Profile Neo (ClaudioP)

Send message
Joined: 15 Mar 20
Posts: 5
Credit: 17,402,221
RAC: 28,181
Message 101115 - Posted: 7 Apr 2021, 10:05:12 UTC

Hi, it's since yesterday I'm getting 100% computation errors on my two hosts. Everything worked fine before, just getting some of these errors very rarely. No hardware or software changes have been made, computing preferences are all defaults, both hosts run Win10 pro with latest patches, one host run 8 Rosetta instances (100% cpu) out of 12 available threads and has 7.5GB of free ram out of 12GB total, SSD drive has 125GB available and machine is basically idle running Rosetta tasks only. The other host run 10 Rosetta instances (100% cpu) out of 16 threads, 26GB of free ram and 579GB of free SSD storage. Also this host is basically idle with very light activity sometime. I noticed that before terminating with computation error some tasks shows "waiting for memory" for 1-2 secs which looks very strange to me. Any idea regarding? Anything to check? Thanks.
ID: 101115 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1681
Credit: 17,854,150
RAC: 22,647
Message 101117 - Posted: 7 Apr 2021, 10:30:03 UTC - in response to Message 101115.  

I noticed that before terminating with computation error some tasks shows "waiting for memory" for 1-2 secs which looks very strange to me.
There have been issues for a week or more with Tasks incorrectly configured & asking for way more disk space & memory than they actually need to run.
That's what the "Waiting for memory" message is about.



Any idea regarding?
All of the errors are due to a batch of faulty Work Units being released.
So far i have 6 that are running, and 571 have errored out in a matter of seconds.

As it seems there is nothing being done by the Project to sort things out, it'll just be a case of waiting for all of these Tasks to error out, then error out again before they are declared dead & never to return. Then we wait for more work to be released, And hopefully with the next batch it will be possible to process them without nothing but errors.
Grant
Darwin NT
ID: 101117 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 101118 - Posted: 7 Apr 2021, 10:36:49 UTC - in response to Message 101115.  

It’s not just you. Lots of people are reporting errors like this. It looks like the current batch of work units is bad. There’s nothing we can do other than keep letting them fail until somebody at Rosetta fixes or removes them, or switch to another project in the meantime.

There’s some additional discussion in the thread in Number crunching
ID: 101118 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Neo (ClaudioP)

Send message
Joined: 15 Mar 20
Posts: 5
Credit: 17,402,221
RAC: 28,181
Message 101120 - Posted: 7 Apr 2021, 12:44:53 UTC - in response to Message 101118.  

Ok thanks, I suspected something alike but before blaming Rosetta I wanted to be sure. Will leave boinc manager running awaiting for good tasks.
ID: 101120 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Questions and Answers : Windows : Rosetta 100% computation errors within 20 secs since Apr/06/21



©2024 University of Washington
https://www.bakerlab.org