Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 300 · Next
Author | Message |
---|---|
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
It seems to me that the upload problem has been solved. At least, all my stuck WU's have been uploaded now. Yes, same here. Glad whatever the problem was has been resolved. |
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
All fixed! At least all the backed up stuff is now cleared out. Atta boy Rosetta team, we knew you could do it! Have a great week. Cheers! /M |
J. Ritchie Morrow Send message Joined: 4 Nov 05 Posts: 5 Credit: 341,049 RAC: 0 |
I keep getting the message that 'Task XX exited with zero status but no finished file. If this happens repeatedly you may need to reset the project.' I have reset the project but continue to get the error. Is this an issue on my end or the project's end? Thanks! |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
I keep getting the message that 'Task XX exited with zero status but no finished file. If this happens repeatedly you may need to reset the project.' I have reset the project but continue to get the error. Is this an issue on my end or the project's end? Thanks! Copied and pasted from an earlier answer: On Rosetta this is usually solved by increasing the "use at most xxx% of CPU time" setting to 100. You may then want to reduce the "on multiprocessors, use at most xxx% of the processors" to something less than currently set. Most people find this handles the temperature regulation concerns (that the cpu throttling was designed to address) perfectly. Another possible cause are virus scanners; most folks exclude BOINC from those scans or set it to run only when BOINC isn't active. An explanation and more possible causes can be found here: BOINC FAQ Service Please know that this only becomes a fatal error when it occurs 100 times to a particular task; at that point BOINC assumes the task will never be able to finish and gives up on it, ending it as a client error. If you see this message only occasionally it is safe to ignore it. Best, Snags |
Batschlach Send message Joined: 7 May 17 Posts: 3 Credit: 307,527 RAC: 0 |
Hey, I've received some work units which couldn't be finished due to a compute error. Interestingly, the second person calculating the same WU also resulted in a compute error: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=825566938 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=825557307 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=825537629 (still pending) Is this common behaviour? What has happened there? Best regards |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Moved the details from Batschlach. These are Android WUs. Rosetta Moderator: Mod.Sense |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Hey, This was a bad batch that a researcher accidentally sent out. |
Batschlach Send message Joined: 7 May 17 Posts: 3 Credit: 307,527 RAC: 0 |
This was a bad batch that a researcher accidentally sent out. Oh, I see. Thanks for your answer. And thanks for moving my post into the right thread @Mod.Sense! |
Skillz Send message Joined: 24 May 17 Posts: 3 Credit: 5,914,356 RAC: 58,122 |
Why am I having such problems getting work units? I have over 250 cores that can be crunching but I only have, at the time of this post, 59 slots filled. This is the only project I am running so those other cores are sitting idle. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Why am I having such problems getting work units? The Server Status page is a sea of red: clearly there are problems of some sort. |
mmonnin Send message Joined: 2 Jun 16 Posts: 58 Credit: 23,329,284 RAC: 53,966 |
The only one that needs to be up all the time is the scheduler which I've seen up and all the ones below it have been up and down. Not everything needs to be running 100% of the time for the project to function. Set a longer queue. |
xii5ku Send message Joined: 29 Nov 16 Posts: 22 Credit: 13,815,783 RAC: 622 |
Why am I having such problems getting work units? For a dual- or quad-socket machine, a "Target CPU run time" setting below 4 hours is not sustainable, IME. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,107,172 RAC: 21,100 |
Why am I having such problems getting work units? Correct. Runtime <cannot> be the minimum 1 hour - especially when you have 250 cores (which is great btw). Default run-time is 8 hours, for which you'll do 8 times the work and receive 8 times the credit, but only use 18th of the bandwidth - better for you and the project. Also, likely to reduce the occasions you have unused cores, which answers your question. BUT! You shouldn't change directly from 1hr to 8hrs, otherwise your tasks will miss deadlines. Change up to 2hrs first, until your buffer stockpile is reduced and starts asking for more tasks. Then 3hrs - same process. Then 4hrs etc until you get to a practical level you're happy with - ideally the default 8hrs. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1993 Credit: 9,520,400 RAC: 11,365 |
The Server Status page is a sea of red: clearly there are problems of some sort. Still.... :-( |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
The Server Status page is a sea of red: clearly there are problems of some sort. Sorry for the errors in the status page. I'll take a look. Everything is running as normal so you can ignore the page for now. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,107,172 RAC: 21,100 |
2 long-running tasks with a long time since the last checkpoint: b21_ncst_0601.282._relax_SAVE_ALL_OUT_486708_29_0 Last checkpoint: 7:51:46 CPU Time: 11:56:21 b22_1_0603.18._relax_SAVE_ALL_OUT_486983_40_1 Last checkpoint: 2:54:51 CPU Time: 8:23:20 Both have a default 8 hour runtime and I'm anticipating the watchdog being the only thing that stops them running. I've got 2 more b21 tasks in my queue. Should I abort them? Thinking I will. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,107,172 RAC: 21,100 |
The 1st one has just completed and given full credit (and more) for the extra runtime. Maybe I should just let them complete after all. Let me see what the other one does. 2 long-running tasks with a long time since the last checkpoint: |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,107,172 RAC: 21,100 |
2ns one not so generous, but both acknowledged the full runtime and validated properly The 1st one has just completed and given full credit (and more) for the extra runtime. Maybe I should just let them complete after all. Let me see what the other one does. |
boinc127 Send message Joined: 23 Jan 12 Posts: 3 Credit: 281,019 RAC: 0 |
I've got a b22 task that is wrapping up, but it is crawling to completion at 99.399% Its slowly creeping at 0.01% a minute or so, its in fast relax on model 292 step 7205. I may just abort that task as well... |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1993 Credit: 9,520,400 RAC: 11,365 |
920412254 Unhandled Exception Detected... |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org