Valid WUs not meeting deadline are flagged incorrectly

Author	Message
Michael H.W. Weber Send message Joined: 18 Sep 05 Posts: 18 Credit: 6,870,562 RAC: 447	Message 93953 - Posted: 9 Apr 2020, 9:32:08 UTC When checking my returned tasks today, I found two which were flagged as "Berechnungsfehler", meaning "computation invalid". However, that is not the case as follows: When looking into the details, both tasks were validated against an Apple Darwin machine which generated a faulty result, too. So how can something be validated against something faulty? Checking the sending date of the tasks, the deadline and the return time it quickly appeared that my tasks were simply returned to the server too late. This by the way is no wonder when tasks requiring 1.5 days for completion are given a deadline of only three days on machines which do not run 24/7 and where newly loaded tasks are having a deadline that is SHORTER than tasks already running (which means that BOINC switches tasks). You should re-validate these tasks and make sure that future error classification is working correctly. I actually suspect a general issue with validation between Windows 10 x64 Intel machines and the Apple stuff. Michael. P.S.: Another two of my tasks will forseeably not meet the deadline on this same Alienware laptop of mine. I will let them run and you should consider implementing a grace period of a few days for tasks NOT returned within the deadline. Many distributed computing projects have implemented this including our own (Yoyo@home, RNA World). President of Rechenkraft.net e.V. http://www.rechenkraft.net - The world's first and largest distributed computing association. We make those things possible that supercomputers don't. ID: 93953 · Rating: 0 · rate: / Reply Quote

CIA Send message Joined: 3 May 07 Posts: 100 Credit: 21,059,812 RAC: 0	Message 93971 - Posted: 9 Apr 2020, 13:05:24 UTC - in response to Message 93953. When checking my returned tasks today, I found two which were flagged as "Berechnungsfehler", meaning "computation invalid". However, that is not the case as follows: When looking into the details, both tasks were validated against an Apple Darwin machine which generated a faulty result, too. So how can something be validated against something faulty? Checking the sending date of the tasks, the deadline and the return time it quickly appeared that my tasks were simply returned to the server too late. This by the way is no wonder when tasks requiring 1.5 days for completion are given a deadline of only three days on machines which do not run 24/7 and where newly loaded tasks are having a deadline that is SHORTER than tasks already running (which means that BOINC switches tasks). You should re-validate these tasks and make sure that future error classification is working correctly. I actually suspect a general issue with validation between Windows 10 x64 Intel machines and the Apple stuff. Michael. P.S.: Another two of my tasks will forseeably not meet the deadline on this same Alienware laptop of mine. I will let them run and you should consider implementing a grace period of a few days for tasks NOT returned within the deadline. Many distributed computing projects have implemented this including our own (Yoyo@home, RNA World). Rosetta 4.12 didn't work for older MacOS machines, and the two WU's you showed as examples are exactly what would happen when we would try and run them before 4.15 came out. They would fail in seconds, note at the run-times on both Darwin failures you listed. Can't speak for your second problem, but if the work units fail twice (for whatever reason) they are invalid. Enough of them come back invalid and the researchers know they might have a issue with something. ID: 93971 · Rating: 0 · rate: / Reply Quote

Dayle Send message Joined: 6 Jan 14 Posts: 17 Credit: 949,560 RAC: 0	Message 93978 - Posted: 9 Apr 2020, 15:10:21 UTC - in response to Message 93971. Last modified: 9 Apr 2020, 15:11:54 UTC I've been starting to get validation errors too. I had zero a few days ago, now I've had four units fail. All were instructed to run for 24 hours, although most invalidated in half the time. https://boinc.bakerlab.org/rosetta/result.php?resultid=1143200166 https://boinc.bakerlab.org/rosetta/result.php?resultid=1143200182 https://boinc.bakerlab.org/rosetta/result.php?resultid=1143200238 https://boinc.bakerlab.org/rosetta/result.php?resultid=1142301309 ID: 93978 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 94001 - Posted: 9 Apr 2020, 20:22:57 UTC - in response to Message 93978. Several of these show 600 completed models. That nice round number sounds like it must have been setup as the max for the WU. Having a max avoids the outfiles getting too huge. So that would explain why they didn't run a full 24 hours. They completed the max models before that, and ended. Not sure why the validation errors though. I see no indication of any problem in the rest of the output. Rosetta Moderator: Mod.Sense ID: 94001 · Rating: 0 · rate: / Reply Quote