Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 300 · Next
Author | Message |
---|---|
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
As should be quite apparent to the truly active participants of this project, communicating with active participants by the project is a VERY low priority in the Rosetta scheme of things. That means that issues perhaps viewed as insignificant by the project folks (or perhaps issues that they are simply not aware of) only get passing response. I believe it is an informed choice made by the project to not allocate time and resources to the 'care and feeding' of the active user community. Users get to prioritize as well, as a long time participant (going back over ten years) there have been times when maintained a daily completed work traffic generating 30 to 40 thousand credits. These days, it is more like 5 thousand credits as I shifted MY priorities to the WorldGrid project. We all make choices. And another blank day for stats |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,107,773 RAC: 19,731 |
As should be quite apparent to the truly active participants of this project, communicating with active participants by the project is a VERY low priority in the Rosetta scheme of things. Aside from that, I've wondered whether the project is subject to the communication restrictions instructed from above. Though that kind of subject may be more appropriately discussed in Café Rosetta |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,488,239 RAC: 11,585 |
I think it's probably more likely that there's no funding for someone dedicated to the role, and everyone else has other priorities so it falls to no-one. There could obviously be lots more compute power available here but it might be that there is sufficient as-is so it works for getting the science done, regardless of how frustrating it is for users. I just found my computer sat idle with the 24 hour back-off bug. I'll add a second project, but because the server code here is so old I don't believe I can add a project as a backup - only as a low % so I'll do that. Maybe it would be useful if we maintained a sticky thread (Mod.Sense!) where we list the priorities from our point of view, so the team can see what we think needs fixing. I.e. under URGENT, we'd have the 24hr bug, or make work, and then under the next heading (Less urgent?) we'd have the server upgrade, maybe with a link to the discussion of it. If anything breaks, stick it at the top. Might that help? D |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
But I don't quite understand why our moderator can't just email someone at UW when exceptional problems arise. Aren't they on speaking terms? If they don't need more work done, so be it. But they should tell us I believe. |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
That raises an interesting question. Maybe a driving reason for the project to be essentially NO attention to the care and feeding of its most active participants reflects an internal decision that they already have too much work to process internally. That, by actively ignoring the user community they are hoping to reduce the number of work units processed. I can confirm that approach has worked just fine for me.... But I don't quite understand why our moderator can't just email someone at UW when exceptional problems arise. Aren't they on speaking terms? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
But I don't quite understand why our moderator can't just email someone at UW when exceptional problems arise. Aren't they on speaking terms? Actually, over the weekend, I did send an EMail to DK and Dr. Baker, with links to these msg boards, summarizing some suggested "todos" that would eliminate some of the annoyances I believe can be easily addressed. I cited:
Rosetta Moderator: Mod.Sense |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Actually, over the weekend, I did send an EMail to DK and Dr. Baker, with links to these msg boards, summarizing some suggested "todos" that would eliminate some of the annoyances I believe can be easily addressed. Thanks very much. It will be interesting to see their response. I sometimes think like BarryAZ that it is just an indirect way of managing their workload. It works for a while, but I don't think the long-term prospects are good. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
In response to Mod.Sense's feedback/recommendations, I increased the short 2 day deadline to 3 days. I don't think I can increase the deadline longer for these high priority jobs. The deadlines are short for an important reason since there are time constraints for these jobs (weekly CAMEO benchmarks for Robetta). I also increased the standard deadline from 5-7 days to 2 weeks which should help. If anyone knows how to change the 24 hour backoff, please chime in. I'm not sure if it's server or client logic and configurable. Also, if anyone knows how to fix the "Android" alert, please let us know. I'll of course look into this also. The last issue point I think can be coded into our application and I'll put it on the list of things to do for the next update. Also, if you are not getting work, it is most likely because there isn't any to issue at the time. Our demand comes in waves as projects progress. However, our public structure prediction server, Robetta, usually provides continual work. It was down for a few days last week for updates though. |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
David, thanks much for jumping in here -- the air had been getting rather thin. My sense is that the 24 hour backoff is a server specific function -- as in the multiple other projects I work with their is a progressive backoff typically starting at either 5 minutes or 1 hour and progressing with multiple non-responsiveness up to as much as 3 to 5 hours and then recycling to a 1 hour back off. It is only with Rosetta I have seen that. As to the android no work report -- again that is likely a project specific configuration. Other projects provide a 'no work for your applications' message but with Rosetta it seems specific to android work -- I would think that could be configured out. I don't do code though... so its all speculative. The other issue -- for which your post is seriously appreciated, is the sense of the active user community being a bit 'unloved' by a lack of periodic responses from folks such as yourself. I'm sure you have more work than time, but even a weekly "we're here and watching" message might reduce that sense. Thanks again for you message. In response to Mod.Sense's feedback/recommendations, I increased the short 2 day deadline to 3 days. I don't think I can increase the deadline longer for these high priority jobs. The deadlines are short for an important reason since there are time constraints for these jobs (weekly CAMEO benchmarks for Robetta). I also increased the standard deadline from 5-7 days to 2 weeks which should help. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
I found where the relevant parameters are set in the scheduling code. I'm open to suggestions and feedback for more preferable values as long as it doesn't cause too much load on our servers. // various delay params. // Any of these could be moved into SCHED_CONFIG, if projects need control. #define DELAY_MISSING_KEY 3600 // account key missing or invalid #define DELAY_UNACCEPTABLE_OS 3600*24 // Darwin 5.x or 6.x (E@h only) #define DELAY_BAD_CLIENT_VERSION 3600*24 // client version < config.min_core_client_version #define DELAY_NO_WORK_SKIP 0 // no work, config.nowork_skip is set // Rely on the client's exponential backoff in this case #define DELAY_PLATFORM_UNSUPPORTED 3600*24 // platform not in our DB #define DELAY_DISK_SPACE 3600 // too little disk space or prefs (locality scheduling) #define DELAY_DELETE_FILE 3600*4 // wait for client to delete a file (locality scheduling) #define DELAY_ANONYMOUS 3600*4 // anonymous platform client doesn't have version #define DELAY_NO_WORK_TEMP 0 // client asked for work but we didn't send any, // because of a reason that could be fixed by user // (e.g. prefs, or run BOINC more) // Rely on the client's exponential backoff in this case #define DELAY_NO_WORK_PERM 3600*24 // client asked for work but we didn't send any, // because of a reason not easily changed // (like wrong kind of computer) #define DELAY_NO_WORK_CACHE 0 // client asked for work but we didn't send any, // because user had too many results in cache. // Rely on client's exponential backoff #define DELAY_MAX (2*86400) // maximum delay request |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I believe this one is the behavior people are seeing elsewhere and expecting: #define DELAY_NO_WORK_SKIP 0 // no work, config.nowork_skip is set // Rely on the client's exponential backoff in this case Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,107,773 RAC: 19,731 |
I found where the relevant parameters are set in the scheduling code. I'm open to suggestions and feedback for more preferable values as long as it doesn't cause too much load on our servers. I'd suggest 1 hour as a reasonable compromise between server load and our buffer sizes |
Erich56 Send message Joined: 11 Jan 16 Posts: 35 Credit: 1,437,503 RAC: 0 |
For several days, I've been back to Rosetta with one of my PCs, and have crunched 14 tasks since then. Today, when BOINC was trying to download the next task, I got the notice "06.02.2017 14:49:08 | rosetta@home | Rosetta Mini for Android is not available for your type of computer." How come? I am not trying to crunch Rosetta Mini for Android on my Windows PC. |
Erich56 Send message Joined: 11 Jan 16 Posts: 35 Credit: 1,437,503 RAC: 0 |
"06.02.2017 14:49:08 | rosetta@home | Rosetta Mini for Android is not available for your type of computer." Just now, after a while, a new task was downloaded :-) |
Erich56 Send message Joined: 11 Jan 16 Posts: 35 Credit: 1,437,503 RAC: 0 |
Unfortunaltely, now again the BOINC messanger shows the meassage "6.02.2017 17:38:42 | rosetta@home | Rosetta Mini for Android is not available for your type of Computer" when trying to download a new task on my Windows system. Why so? What's going wrong? |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1993 Credit: 9,520,400 RAC: 11,365 |
"6.02.2017 17:38:42 | rosetta@home | Rosetta Mini for Android is not available for your type of Computer" A long time ago, in a galaxy far.... |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Unfortunaltely, now again the BOINC messanger shows the meassage Erich, This is a known problem. Rosetta Mini for Android problem Read earlier in this thread. They are working to find and fix it. But it has gotten better for me the last day or two, whether that means anything. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Unfortunaltely, now again the BOINC messanger shows the meassage I think this alert only occurs when there are no non-android work units available. It's not a serious issue. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Unfortunaltely, now again the BOINC messanger shows the meassage We know it is not a serious issue, but EVERYONE that encounters this message immediately feels things are not running properly (except, I suppose, an Android user). That is why the request was made to improve the wording of the message. Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,107,773 RAC: 19,731 |
Unfortunately, now again the BOINC messanger shows the message But the 24hr backoff that results directly from it <is> a serious issue for <users> if not for the Rosetta project itself. Our buffers run out and we either run nothing or many tasks get downloaded from backup projects if we have one set. I can't even believe you said that tbh. It came up on 3 of my devices today and 1 of my team-members - all coming up with the 24hr backoff message. 2 of those 4 are attended, 2 aren't. If the unattended ones are unlucky they'll re-poll after 24hrs and maybe find there aren't tasks again, which'll mean they get another 24hr backoff and run out of Rosetta work. And when I get to them later in the week I'll spend a few days forcing a heap of non-preferred project's tasks to run in order to clear them down so there's space to get Rosetta tasks back into their buffer. Then when I get back here a few days later I may find the same here and do the same. You make think it's not serious. I think it's a circus that's been driving me crazy for the last few months without a break. So if you could see your way clear to changing that back-off to 1 hour instead of 24 hours (sounds like two minutes work to me) - because it hasn't happened yet - I'd kind of appreciate it, if that's not too much to ask. And if you could avoid saying all this manual intervention I'm having to do week after week after week after week "isn't a serious issue" ever again in your entire lifetime, that would be kind of neat too. No rush, obviously... |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org