Questions and Answers : Windows : Rosetta 100% computation errors within 20 secs since Apr/06/21
Author | Message |
---|---|
Neo (ClaudioP) Send message Joined: 15 Mar 20 Posts: 5 Credit: 17,411,122 RAC: 20,986 |
Hi, it's since yesterday I'm getting 100% computation errors on my two hosts. Everything worked fine before, just getting some of these errors very rarely. No hardware or software changes have been made, computing preferences are all defaults, both hosts run Win10 pro with latest patches, one host run 8 Rosetta instances (100% cpu) out of 12 available threads and has 7.5GB of free ram out of 12GB total, SSD drive has 125GB available and machine is basically idle running Rosetta tasks only. The other host run 10 Rosetta instances (100% cpu) out of 16 threads, 26GB of free ram and 579GB of free SSD storage. Also this host is basically idle with very light activity sometime. I noticed that before terminating with computation error some tasks shows "waiting for memory" for 1-2 secs which looks very strange to me. Any idea regarding? Anything to check? Thanks. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1683 Credit: 17,919,030 RAC: 22,447 |
I noticed that before terminating with computation error some tasks shows "waiting for memory" for 1-2 secs which looks very strange to me.There have been issues for a week or more with Tasks incorrectly configured & asking for way more disk space & memory than they actually need to run. That's what the "Waiting for memory" message is about. Any idea regarding?All of the errors are due to a batch of faulty Work Units being released. So far i have 6 that are running, and 571 have errored out in a matter of seconds. As it seems there is nothing being done by the Project to sort things out, it'll just be a case of waiting for all of these Tasks to error out, then error out again before they are declared dead & never to return. Then we wait for more work to be released, And hopefully with the next batch it will be possible to process them without nothing but errors. Grant Darwin NT |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
It’s not just you. Lots of people are reporting errors like this. It looks like the current batch of work units is bad. There’s nothing we can do other than keep letting them fail until somebody at Rosetta fixes or removes them, or switch to another project in the meantime. There’s some additional discussion in the thread in Number crunching |
Neo (ClaudioP) Send message Joined: 15 Mar 20 Posts: 5 Credit: 17,411,122 RAC: 20,986 |
Ok thanks, I suspected something alike but before blaming Rosetta I wanted to be sure. Will leave boinc manager running awaiting for good tasks. |
Questions and Answers :
Windows :
Rosetta 100% computation errors within 20 secs since Apr/06/21
©2024 University of Washington
https://www.bakerlab.org